Confluence for 5.26.24

Recapping Google's announcements. Are devices the next frontier? The dilemma of the “Senior Employee” metaphor. Anthropic resources for prompt design.

May 26, 2024

Midjourney prompt: powerful intelligence reaching hundreds of millions of personal devices around the world

Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI and corporate communication:

Recapping Google’s Announcements
Are Devices the Next Frontier?
The Dilemma of the “Senior Employee” Metaphor
Anthropic Resources for Prompt Design

Recapping Last Week’s Google Announcements

Google shares how it’s bringing Gemini to more of its products and services.

While OpenAI’s announcement of GPT-4o dominated the conversation last week, Google also made a series of significant announcements at its annual developer conference, Google I/O. The key takeaway? Google is embedding AI into nearly all its products and services.

From asking Gemini questions about your Google Photos library to receiving AI-generated summaries of your Google search results, artificial intelligence is set to become a ubiquitous presence across Google’s ecosystem. While we won’t cover every detail shared at the conference, here are a few noteworthy highlights:

Gemini is coming to Workspace. Paid subscribers will soon have access to Gemini 1.5 Pro, Google's most advanced language model, directly within Google Docs, Gmail, Sheets, and other Workspace apps. Users will be able to search, ask questions, and summarize documents or emails without leaving their workflow. We haven’t had the opportunity to test this integration yet, but we're curious to see how it compares to Office 365 Copilot, which we’ve found to be limited in its current form.
Google has an answer to GPTs — Gems. Gems will allow Gemini users to create and save specific ways of interacting with Gemini, giving it different personas, following distinct conversation flows, etc. While Google didn’t show much about exactly how they will work or how users will create them, we don’t anticipate Gems to differ dramatically from our experiences with GPTs. That said, they do represent another step toward more personalized and customizable AI.
Astra gives us a look at the future of AI assistants. Perhaps the most technically impressive demo at Google I/O was the unveiling of Astra, a multi-modal, personalized AI assistant. Google did not share a firm release date, but the demo is well worth watching.

Google has shared more examples of what Astra will be capable of doing, and we’d encourage you to spend time with it. The possibilities are remarkable.

The capabilities showcased at Google I/O serve to reinforce the point we made last week — the tools and capabilities people will use outside of the workplace are becoming increasingly sophisticated. As the gap widens between what individuals can experience in their personal lives and what they encounter at work, organizations will need to be thoughtful about how they encourage employees to share how they use this technology and how it can help them in their work.

Are Devices the Next Frontier?

Hints and announcements from Microsoft, Apple, and Amazon suggest so.

Last week, we wrote about the implications of GPT-4o, OpenAI’s newest generative AI model, becoming available to everyone, everywhere for free. Our view is that the more widely available these powerful models become, the more people will begin using them in their personal lives and heightening their expectations for access to similarly powerful tools at work. The GPT-4o announcement is part of a larger trend we’ve seen over the past year of companies integrating generative AI into the software applications that people use every day: the Microsoft Office suite, Facebook, Google search, and many more. Mostly, these integrations have been limited to software applications in the cloud. In the past few weeks, however, there have been increasing indications of a new frontier on which a similar dynamic will begin to play out — our physical devices.

The most concrete of these indications is Microsoft’s introduction of a new class of Copilot+ PCs. In a blog post introducing the Copilot+ PCs, Microsoft’s Yusuf Mehdi emphasizes that “We have completely reimagined the entirety of the PC – from silicon to the operating system, the application layer to the cloud – with AI at the center, marking the most significant change to the Windows platform in decades.” While Microsoft’s announcement garnered a fair share of media attention last week, Microsoft is not the only PC maker investing in computers optimized for AI. In The Wall Street Journal, Dan Gallagher notes that “The PC makers Dell Technologies, Lenovo Group and others also are embracing on-device AI for their Windows-powered machines, with major launches coming later this year.”

Beyond PCs, Apple and Amazon have hinted at major announcements of their own regarding upgraded on-device AI. According to reports, Apple plans to upgrade Siri, and Amazon appears similarly poised to overhaul Alexa. Why does this matter? Alexa-powered devices are already in hundreds of millions of households, and the iPhone “currently takes 85 percent of global smartphone profits and generates more than $200 billion in sales,” per the New York Times article linked above. Bringing the power of a GPT-4o-class model to those devices will only accelerate the wave we wrote about last week. When these upgrades come, for hundreds of millions of people around the world, there will be no need to go to a website, download an app, or create an account to access leading-edge AI. It will just “be there” — on your phone, in your kitchen, and everywhere else.

Time will tell, of course, how all of this plays out. We expect these device and hardware developments to follow a similar trajectory to the one we’ve seen play out with software over the past 18 months. There will be a class of new, “AI-native” devices (like Microsoft’s Copilot+ PCs), while powerful AI also makes its way into the devices we already own and use every day. And as we wrote last week: once people get a taste of this technology at home, they’re going to want it at work.

The Dilemma of the “Senior Employee” Metaphor

Thinking of advanced AI as a senior employee raises some important questions that every organization will need to reckon with.

In a recent podcast appearance, OpenAI CEO Sam Altman used the metaphor of a “senior employee” as one way of thinking about advanced AI:

[One way of thinking about advanced AI] is a great senior employee … One of the things that I like about a senior employee is they’ll push back on me. They will sometimes not do something I ask, or they sometimes will say, ‘I can do that thing if you want, but if I do it, here’s what I think would happen …” … It can reason and it has the kind of relationship with me that I would expect out of a really competent person that I worked with, which is different from, like, a sycophant.

Ethan Mollick, writing about GPT-4o, uses a similar frame of thinking about AI as a coworker (the emphasis below is ours):

If an AI that seems to reason like a human being can see and interact and plan like a human being, then it can have influence in the human world. This is where AI labs are leading us: to a near future of AI as coworker, friend, and ubiquitous presence. I don’t think anyone, including OpenAI, has a full sense of all of the implications of this shift, and what it will mean for all of us.

We agree with Altman and Mollick that it is helpful to think about — and engage with — AI just as you would a person. We also agree with Mollick, however, that there are going to be some very important implications of this shift and what it means — particularly for organizations’ growth and development strategies for employees. Matt Beane, in a post on “The Specter of Skills Inequality,” forecasts one of them (again, the emphasis below is ours):

Give us technology that allows us to get rapid productivity gains if we reduce novice involvement in the work? On average, we’ll do it. From this point of view, the “safe” prediction from all the research I’m aware of — mine and others’ — is that we will continue to seek immediate productivity from genAI and other intelligent technologies.

We have a new class of technologies that increasingly approximates the capabilities of a senior employee, and there are no indications of that momentum slowing down. As Beane points out, the research suggests that, when we have the ability to use technology for rapid productivity gains — which other research shows today’s generative AI tools are more than capable of providing, in addition to an increase in quality — we will do it. Often, we might do so at the expense of others who need that experience.

Thus the dilemma, which is only one dilemma of many. Senior employees do not arrive fully formed. Senior employees become senior employees by developing skills through repetition and by developing judgment through experience. Every organization right now has junior employees who still need those repetitions and those experiences to develop the skills and judgment of a senior employee. A question that every organization should be thinking about is how to make sure that happens in an era where AI can increasingly handle tasks traditionally performed by junior employees — and that, without an intentional approach, could potentially limit their opportunities for growth and development.

Anthropic Resources for Prompt Design

The creators of Claude offer several tools to help you craft better prompts.

One of the moving targets with generative AI is prompt design — the act (or science, or perhaps art) of crafting prompts for an LLM like ChatGPT or Claude. A year ago, many were forecasting that “prompt engineering” wouldn’t just be a new skill, it would become a trade in itself. As the models have advanced, though, they’ve become better at interpreting a user’s query, and the importance of prompt design has been a matter of debate.

As the capabilities of large language models continue to expand, some argue that the importance of prompt engineering is diminishing. After all, with more advanced models like GPT-4 and Claude 3, users can often get satisfactory results with simpler, more straightforward prompts. For basic queries and general-purpose tasks, it may seem that the art of crafting the perfect prompt is becoming less critical.

However, we believe that prompt design is not just becoming less important — it’s also becoming more important, particularly for highly specific and nuanced use cases. Take, for example, the challenge of getting an LLM to adopt the right voice and style for a particular writing task. Many have found that without careful prompt engineering, LLMs can struggle to capture the subtleties of tone, pacing, and vocabulary needed to convincingly ghostwrite in a specific style. But with a thoughtfully designed prompt that includes detailed style guidelines and relevant examples, these models can produce remarkably effective results. The very paragraphs you’re reading now1 illustrate this point: they were generated by our Claude Confluence Ghostwriter after we provided just six sentences of instruction. The output required no further editing, demonstrating the power of a well-crafted prompt in eliciting the desired style and quality of writing from an LLM.

So for now at least, we think prompt design still matters quite a bit. Just deleting the word “warm” or “friendly” from the persona guidance in a prompt can have a dramatic effect on how an LLM might respond. Adding a section on determining salience of advice it might recall can make a big difference in what it thinks is important. And having instructions that lack coherence (like asking it to be concise in one place and detailed in another) can confound a model.

You don’t need to be a software engineer to do this well, but you do need to practice by making changes to prompts and paying attention to the output you get. You also can study, because there’s a lot you can learn from how the experts write the prompts they use for LLMs. Most of the large generative AI labs have resources on prompt design. We are a fan of Claude for communication work, as it just seems to have a better, more human voice and approach to language than the other models. It’s a great combination of smart and articulate. Anthropic has a prompt engineering guide here, which we’ve pointed to before, but they also have three other resources that might be helpful:

A prompt library of pre-written Claude prompts. Some of these may have direct utility for you, and some may not … but you can learn an awful lot by noting how they are written — their structure and tone. One example is this prompt in which Claude will review and cite sources from an uploaded document:

You are an expert research assistant. Here is a document you will answer questions about:  [Full text of document not pasted here for brevity] 

First, find the quotes from the document that are most relevant to answering the question, and then print them in numbered order. Quotes should be relatively short. If there are no relevant quotes, write “No relevant quotes” instead. 

Then, answer the question, starting with “Answer:“. Do not include or reference quoted content verbatim in the answer. Don’t say “According to Quote [1]” when answering. Instead make references to quotes relevant to each section of the answer solely by adding their bracketed numbers at the end of relevant sentences. 

Thus, the format of your overall response should look like what’s shown between the tags. Make sure to follow the formatting and spacing exactly. 

Quotes: [1] “Company X reported revenue of 12millionin2021."<br/>[2]"Almost9012millionin2021."<br/>[2]"Almost9012 million. [1] Almost 90% of it was from widget sales. [2] 

If the question cannot be answered by the document, say so.

Anthropic’s Prompt Engineering Interactive Tutorial. You need an Anthropic API key to actually do the tutorial (they are free and you can get one on Anthropic’s website), but even without an API key there’s a lot to learn just from reading through the sections and seeing the examples.

Anthropic’s experimental Prompt Generator, where you paste in instructions and it creates a prompt for you. Again, you need an API key to have access to this, and again, they are free. The output can be a bit technical because this is really a tool for developers to use to get prompts to embed in applications, but the language and structure of the prompt is still very helpful. We often use it to revise or critique a prompt we’ve already developed. Here’s an example of its output, starting with this instruction: “Allow the user to chat with an uploaded library of prior newsletter articles. They should be able to ask for advice based on the collective contents, extract specific topics, summarize points of view, and otherwise have a virtual chat with the authors based on the contents.”

Here is a library of newsletter articles that I would like you to use as a knowledge base to answer questions from users:

<library>
{{LIBRARY}}
</library>

Please read through the library carefully. You will use the information in these articles to engage in a dialogue with the user, answering their questions, providing advice, extracting and summarizing relevant points, and chatting with them from the perspective of the articles' authors.

Here is the user's query:
<query>
{{QUERY}}
</query>

First, in a <scratchpad> section, brainstorm some initial thoughts on how you could respond to this query using the information in the article library. Consider:

- What articles or passages are most relevant to the query? 
- How can you synthesize information from multiple articles into a coherent response?
- What would be the perspective or advice of the articles' authors on this topic?
- How can you engage with the user conversationally while still grounding your response in the library content?

After brainstorming, provide your final response to the user in an <answer> tag. Aim to directly address their query while providing relevant information, advice, opinions and conversation inspired by the article library. Use specific details from the articles to support your points. Adopt a friendly, engaging tone as if you are the collective voice of the articles' authors. But avoid presenting any information or views that are not grounded in the library content.

Taking out the code, we would revise that to:

Attached is a library of newsletter articles that I would like you to use as a knowledge base to answer questions from users. Please read through the library carefully. You will use the information in these articles to engage in a dialogue with the user, answering their questions, providing advice, extracting and summarizing relevant points, and chatting with them from the perspective of the articles' authors.

First, brainstorm some initial thoughts on how you could respond to this query using the information in the article library. Consider:

- What articles or passages are most relevant to the query? 
- How can you synthesize information from multiple articles into a coherent response?
- What would be the perspective or advice of the articles' authors on this topic?
- How can you engage with the user conversationally while still grounding your response in the library content?

After brainstorming, provide your final response to the user. Aim to directly address their query while providing relevant information, advice, opinions and conversation inspired by the article library. Use specific details from the articles to support your points. Adopt a friendly, engaging tone as if you are the collective voice of the articles' authors. But avoid presenting any information or views that are not grounded in the library content.

If you don’t have access to the generator (and don’t want to get an API key), we’ve posted an edited version of the “helper metaprompt” in the footnotes,2 and you can see the whole thing here. We gave Claude Opus the edited helper metaprompt with this pasted in the [your query request would go here] space: “Write a prompt that helps the user think through investment options given a particular situation or scenario they will provide,” and it gave us these instructions:

Analyze the investment scenario provided by the user in the following text: <scenario> $scenario$ </scenario>.

First, acknowledge that you understand the user's situation and investment question. Express that you are happy to help them think through some options.

Next, ask the user for some additional context that will help you provide the most relevant guidance:

- What are the user's primary investment goals? (e.g. growth, income, capital preservation, etc.)
- What is the user's approximate investment time horizon? How many years until they need to access the money?
- What is the user's risk tolerance? Are they comfortable with the potential for significant short-term losses in pursuit of higher long-term returns, or do they prefer a more conservative approach?

Wait for the user to provide this additional context. Then, keeping their scenario, goals, time horizon and risk tolerance in mind, suggest 2-3 investment options that may be appropriate for their situation. For each option, briefly explain:

- What the investment is (e.g. stocks, bonds, real estate, commodities, etc.)
- The key benefits and risks of the investment
- Why it may be a good fit given the user's particular circumstances

After presenting the options, remind the user that this is just a starting point to get them thinking in the right direction. Emphasize that they should always do their own in-depth research and consult with a qualified financial professional before making any investment decisions, as each individual's situation is unique.

Close by wishing them the best of luck with their investment planning and invite them to let you know if they have any other questions.

We changed the first line to read “Analyze the investment scenario provided by the user,” took the rest, and pasted it into the instructions field of a custom GPT (along with a line describing the GPT’s persona as an investment advisor) to make a test investment advisor GPT that you can see here (we probably don’t need to say it, but don’t use it for investment advice). Between Claude and ChatGPT, we went from a thesis to a custom chat bot with professionally-designed instructions in just a few minutes.

Eventually the technology may make prompt design meaningless, but at least for now, we think it matters a great deal. While the standard models are now so smart that they can answer all sorts of things with ease regardless of the prompt (e.g. “How do I fix yellowing on the leaves of an orchid” or “Give me guidance and exercises for getting better at editing other’s writing”), for more sophisticated uses, the prompt matters. We hope this collection of tools from Anthropic, and the examples of how we use them, will help you hone your skills.

We’ll leave you with something cool: Artist Holly Herndon’s tool, Holly+, allows users to upload audio files that are then re-created using Herndon’s voice.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

This paragraph and the one before. The tip-off is that a sentence begins with “however.” This is against our style guide, and even though it’s in Claude’s instructions, it led with “however” nonetheless. As with any writer you may use or retain, you still need to edit. Here’s the text we gave Claude 3 Opus:

Give me two paragraphs that make this point:

We’re not so sure that prompt engineering is becoming less important. In fact, we think that it might be that prompt design is becoming both less and more important -- less important for basic queries, but more important for very specific use cases. Getting the right LLM voice is an example. Many have found LLMs to be poor ghostwriters. But if you give them a well-crafted prompt with a detailed style guide and lots of examples, they can do a great job. These two paragraphs are an example. We gave our Claude Confluence Ghostwriter six sentences in instruction and it wrote these two paragraphs. We've not edited them.

Today you will be writing instructions to an eager, helpful, but inexperienced and unworldly AI assistant who needs careful instruction and examples to understand how best to behave. I will explain a task to you. You will write instructions that will direct the assistant on how best to accomplish the task consistently, accurately, and correctly. Here are some examples of tasks and instructions.

[examples deleted for brevity]

That concludes the examples. Now, here is the task for which I would like you to write instructions:

[your query request would go here]

To write your instructions, follow THESE instructions:

1. In <Inputs> tags, write down the barebones, minimal, nonoverlapping set of text input variable(s) the instructions will make reference to. (These are variable names, not specific instructions.) Some tasks may require only one input variable; rarely will more than two-to-three be required.

2. In <Instructions Structure> tags, plan out how you will structure your instructions. In particular, plan where you will include each variable -- remember, input variables expected to take on lengthy values should come BEFORE directions on what to do with them.

3. Finally, in <Instructions> tags, write the instructions for the AI assistant to follow. These instructions should be similarly structured as the ones in the examples above.

Note: This is probably obvious to you already, but you are not *completing* the task here. You are writing instructions for an AI to complete the task.

Note: Another name for what you are writing is a "prompt template". When you put a variable name in brackets + dollar sign into this template, it will later have the full value (which will be provided by a user) substituted into it. This only needs to happen once for each variable. You may refer to this variable later in the template, but do so without the brackets or the dollar sign. Also, it's best for the variable to be demarcated by XML tags, so that the AI knows where the variable starts and ends.

Note: When instructing the AI to provide an output (e.g. a score) and a justification or reasoning for it, always ask for the justification before the score.

Note: If the task is particularly complicated, you may wish to instruct the AI to think things out beforehand in scratchpad or inner monologue XML tags before it gives its final answer. For simple tasks, omit this.

Note: If you want the AI to output its entire response or parts of its response inside certain tags, specify the name of these tags (e.g. "write your answer inside <answer> tags") but do not include closing tags or unnecessary open-and-close tag sections.

Confluence: AI, Leadership, and Communication

Discussion about this post