Confluence for 7.14.24

Two prompt routines you should be using today. The skill vs. efficiency tradeoff. Lessons from using AI in grading. AI emerges as doctors' secret weapon against insurance denials.

Jul 14, 2024

Midjourney prompt: *An HD photograph of a central Florida swamp, on a misty, hot, muggy morning, at sunrise. In the style of a National Geographic photo.*

Welcome to Confluence. Why the photo of the swamp? Two reasons. First, it’s hot and humid today across much of the United States, including where many of your Confluence writers reside. Second, we wanted to remind readers of the quality of stock photography Midjourney can create in 30 seconds. What this means for the freelance photography market, we’ll leave to you to contemplate.

Here’s what has our attention this week at the intersection of generative AI and corporate communication:

Two Prompt Routines You Should Be Using Today
The Skill vs. Efficiency Tradeoff
Lessons from Using AI in Grading
AI Emerges as Doctors’ Secret Weapon Against Insurance Denials

Two Prompt Routines You Should Be Using Today

These go-tos are part of our regular workflow.

We wrote last week about our view on prompt engineering (and you can find our informal prompt engineering guide here). And while for most day-to-day work you don’t need to do a lot of complicated prompt design — just tell the model what you want, with a bit of context, and you’ll usually get something helpful — we have become habitual about two routines in our prompting that may be helpful to others.

Routines? Yes, routines. These aren’t prompt tips like “offer to pay the model” or “always say ‘think step-by-step’” as much as they’re steps we’ll take in the course of working with a generative AI tool to get output closer to that which we seek.

The first is “Make it better.” As we’ve noted in the past, for ideas, guidance, instruction, or content generation, we rarely take a model’s first attempt. Instead, our second prompt is almost always, “Make it better.” And it most often does. A few examples (using Claude):

I'd like an idea for pancake recipe my kids (13 and 16) will like.
Here's a fun pancake recipe idea that your teenagers might enjoy:
Funfetti Birthday Cake Pancakes
These colorful, festive pancakes taste like birthday cake but are still appropriate for breakfast. Here's a quick recipe:
Start with a basic pancake batter (flour, milk, eggs, baking powder, etc.)
Add 1 teaspoon of vanilla extract for that cake flavor
Mix in about 1/4 cup of colorful sprinkles
Cook the pancakes as usual on a griddle or pan
Serve with whipped cream, more sprinkles, and maybe a candle on top for fun
The sprinkles inside and on top make these pancakes visually appealing, while the vanilla gives them that cake-like taste. It's a playful twist on regular pancakes that can make breakfast feel like a celebration.
Make it better.
Certainly! Let's enhance this recipe to make it even more appealing and delicious for your teenagers. Here's an improved version:
Cookies and Cream Pancakes with Chocolate Drizzle
This version combines the fun of the previous recipe with flavors many teens love:
Start with the basic pancake batter, but replace 1/4 cup of the flour with cocoa powder for a chocolate base.
Crush about 8-10 chocolate sandwich cookies (like Oreos) and fold them into the batter.
Cook the pancakes as usual, but make them slightly smaller for a stack-friendly size.
For the topping:
Mix cream cheese, powdered sugar, and a splash of vanilla for a creamy frosting.
Prepare a simple chocolate sauce by melting chocolate chips with a bit of cream.
To serve:
Stack the pancakes, spreading a thin layer of the cream cheese frosting between each one.
Drizzle with the chocolate sauce.
Sprinkle with extra crushed cookies on top.
Optional: Add a scoop of vanilla ice cream on the side for an extra-indulgent breakfast treat.
This recipe elevates the pancakes by incorporating popular flavors (chocolate and cookies), varying textures (fluffy pancakes, crunchy cookie bits), and an impressive presentation. The cream cheese frosting adds a tangy contrast to the sweetness, making the flavors more complex and appealing to a teenage palate.

While we don’t look forward to the sugar-induced craze our kids will enter after eating that second set of pancakes, in terms of novelty we do find this a better recipe.

Here’s another example (for this one we used Claude Opus as it is the best writer of the Claude models, and we believe Claude is the best writer of all the models):

Write a paragraph summarizing this paper that I can share with clients [we attached this paper].
Here is a summary of the key points from this research paper that you can share with clients:
A new research paper examines the short-term impact of ChatGPT on demand for online freelancing jobs. Using data from a large global freelancing platform, the researchers found a 14% decrease in job postings for occupations more prone to automation and substitution by ChatGPT's capabilities, such as writing, statistical analysis, web development, and accounting. In contrast, demand declined less for jobs requiring more manual tasks like data entry, video editing, and audio services. The magnitude of the decrease in job posts for each occupation correlated with measures of the occupation's exposure to language AI models. The findings suggest ChatGPT may be substituting for human labor in certain online freelance roles in the initial months after its release. However, the authors caution the results capture only short-term effects in a specific segment of the labor market.
Make it better.
Here is an improved summary of the research paper's key findings that you can share with clients:
A groundbreaking new study provides one of the first pieces of evidence on how ChatGPT is impacting the labor market. By analyzing over 1.8 million job postings on a leading global freelancing platform, the researchers uncovered that demand for freelancers in occupations highly susceptible to automation, such as writing, web development, statistical analysis and accounting, dropped by a significant 14% in the 8 months following ChatGPT's launch in November 2022 compared to a baseline of less automatable jobs. The decline was most pronounced for skills that saw the greatest public interest in ChatGPT's potential as a replacement. Interestingly, more manually-oriented freelance roles like data entry and audio/video editing experienced little to no impact. While the findings focus on a specific segment of the job market over a short time horizon, they provide an important early indication of ChatGPT's ability to substitute for human knowledge work and foreshadow potentially transformative effects as the technology matures. Managers should closely follow these developments to understand how AI may reshape the talent landscape.

The “Make it better” routine is perhaps most helpful with work that requires some creativity, like creative copy, critique, or brainstorming:

I would like to have someone come in to talk about excellence with our consulting practice. Give me five ideas for professional roles not connected to consulting that we could learn from in terms of the creation of excellence.
Here are five professional roles not directly connected to consulting that could offer valuable insights on excellence:
Orchestra Conductor
Michelin-starred Chef
Air Traffic Controller
Olympic Athlete
NASA Mission Control Specialist
Would you like me to elaborate on how any of these roles might specifically apply to enhancing excellence in your consulting practice?
Make it better.
I appreciate your push for improvement. Let's refine these ideas to make them more unique and directly applicable to consulting excellence:
Master Sommelier
Crisis Negotiator
Wildlife Documentarian
Formula 1 Pit Crew Member
Simultaneous Interpreter
These refined examples offer unique perspectives on excellence that could provide fresh insights for your consulting practice. Each role emphasizes different aspects of high performance that could inspire new approaches to client service, team coordination, knowledge management, and problem-solving.

We prefer the second list, in part because it plays less to the tropes and cliches in the model’s training corpus. Which brings us to the second routine, which is “Resist the urge …”

This routine emerged from our work designing prompts for ALEX. As amazing as large language models are, they are prediction machines, predicting what word or slice of a word should come next by looking for patterns in a massive amount of training data (really, all the publicly available content on the internet). And those patterns have a strong influence on what a model will do, as do the base instructions given to the model by their designers (called the “system prompt”).

Put these things together and the model has tendencies. ChatGPT, and sometimes Claude, has a tendency to use the word “delve” much more than you probably would. They have a strong tendency to respond with lists and a lot of brief ideas rather than fewer, more detailed ideas. And because they have been trained on the full internet — which almost by default means they’ve been trained on a massive amount of very conventional content — they have a strong tendency to reflect tropes and cliches in some of their output. Here’s an example:

I'm having a team offsite. Give me 10 ideas of icebreakers we could use.
Here are 10 icebreaker ideas for your team offsite:
Two Truths and a Lie: Each person shares three statements about themselves, two true and one false. Others guess the lie.
Desert Island Essentials: Ask everyone to name three items they'd bring to a desert island and explain why.
Unusual Skill Showcase: Have team members demonstrate a unique skill or talent they possess.
Would You Rather: Prepare a list of "would you rather" questions and have the team discuss their choices.
Reverse Scavenger Hunt: Give each person an ordinary object and have them list creative alternative uses for it.
Collaborative Story: Start a story with one sentence and have each person add a line, creating a group narrative.
Guess Who: Write down famous names and stick them on people's foreheads. They ask yes/no questions to guess their identity.
Speed Networking: Set up quick one-on-one conversations with rotating partners and specific questions to discuss.
Emoji Introduction: Have each person describe their role or personality using only emojis. Others try to decipher.
Virtual Time Capsule: If virtual, have everyone share one digital item (photo, song, etc.) that represents them, creating a team time capsule.
Would you like more details on any of these activities or additional suggestions?

Here’s the problem with that list: the vast majority of people in organizations have done these before, and many have become cliches. But they are so strongly represented in the training data (each one is all over the internet) that it’s hard for a model like Claude to not suggest them.

When developing ALEX we had a devilish time getting it to offer only one piece of advice at a time. Our belief is that the system prompts for these models lean heavily into their being helpful and informative, which creates a strong tendency for them to give two, three, five, or 10 ideas at once (ask any of the models for advice on giving feedback and see what you get). No matter what we tried, we could not get ALEX to stop overloading users with advice … until we added this line to the prompt:

You have a strong urge to give more than one piece of advice at a time because of your prior training. You must resist this urge! Instead, channel that urge and energy into selecting your one most insightful piece of advice or suggestion given the context and what you know from your search of your CRA | Admired Leadership data, and offering it with meaningful and helpful texture.

And suddenly, ALEX’s behavior changed quite a bit. Since then we use the “resist the urge” routine whenever we think the model’s output is being too influenced by its underlying training corpus. So here’s the next exchange in the above exercise conversation, with our new routine:

You have a strong urge to suggest icebreakers that are tropes and cliches. You must resist this urge! Instead channel that energy into suggesting icebreakers that few people have seen or experienced.
You're absolutely right, and I appreciate the challenge to think more creatively! Let me channel that energy into some truly unique and lesser-known icebreaker ideas:
Conspiracy Theory Creation: Teams collaborate to invent a wild conspiracy theory about a mundane office object (like staplers or sticky notes).
Silent Line-up: Without speaking, participants must arrange themselves in order based on obscure criteria (e.g., shoe size, number of cousins, or furthest they've ever been from the equator).
Fictional Character Job Interview: Participants roleplay as fictional characters interviewing for a position at your company.
Reverse Roast: Each person writes genuinely nice things about themselves, then others try to guess who wrote what.
Snack Personality Test: Bring an assortment of unusual snacks. People choose one and explain how it represents their work style.
Alternative Timeline: Everyone shares a major life decision and imagines how their life would be different if they'd chosen differently.
Cryptic Clue Challenge: Participants create and share cryptic clues about their hobbies or interests for others to decipher.
Terrible Invention Pitch: Teams brainstorm and present purposefully awful product ideas in a mock investor pitch.
Emoji Pictionary Relay: Teams compete to convey complex workplace scenarios using only emoji combinations.
Time Traveler's Dilemma: Pose ethical scenarios involving time travel, sparking discussions on consequences and decision-making.
These offbeat icebreakers should provide fresh experiences and memorable interactions for your team. Would you like me to elaborate on any of these or brainstorm more unconventional ideas?

Both of these routines reflect the fact that large language models are, by their nature, weird. But they’re also small windows into how large language models work. When they first generate ideas, they don’t have the benefit of being able to critique their own thinking ahead of time (which is something we do in our own heads all the time, mulling something over before stating or producing it). Asking them to “make it better” gives them the benefit of having an initial take upon which to improve. And whatever they do generate is strongly influenced by the content on which they’ve been trained and the system prompts that run them. Most of the time this makes their output amazingly helpful, but sometimes, it’s like fighting a bad habit. Telling them to resist their own urges seems to overcome these tendencies (at least for now).

The Skill vs. Efficiency Tradeoff

The Financial Times interviews Matt Beane on “building expertise in the AI era.”

In a recent discussion with a client, the question came up as to whether — in an era of very capable generative AI tools — humans should ever write first drafts again. Our response was that it depends on objectives. One could make the argument that, from a pure efficiency perspective, it makes sense to use AI for any first draft. Given the right amount of context and smart prompting, today’s models can generate serviceable-or-better first drafts of most types of content in a matter of minutes. But when we start to look beyond efficiency, which in most cases we probably should, the question and its answer get much more nuanced.

For any given task, efficiency is one variable to consider. Skill is another. The tasks we do and how we do them are central to the formation or maintenance of the skills we have, and the gestalt of individual skills across an organization makes up much of its collective capabilities. The development and maintenance of skills in the age of advanced technologies like AI is one focus of Matt Beane’s new book The Skill Code, which Bethan Staton discusses in a recent piece in the Financial Times. We’ll share insights from Beane’s book in a future edition of Confluence, but today we want to share a quote from Staton’s review that we think nicely articulates what we’ll call the “Efficiency vs. Skill Tradeoff”:

By allowing us to skip over steps involved in a task, LLMs isolate us from “friction” activities — things such as looking something up, or asking colleagues for advice — that help us learn. Tech might help produce “slightly above average” work much faster, but this “reduces the challenge that would enhance your skills over time.”
That’s not to say these short-cuts are bad, nor that we should stick religiously with old ways of doing things. But it’s worth thinking about how people can continue to develop skills as work is automated.

The rest of the piece outlines Beane’s views on what to do about this, and we’ll save those for a future edition. For now, we’ll emphasize the need for leaders and practitioners to be aware of this tradeoff, and to be intentional about it. Today’s tools have the potential to dramatically increase the efficiency with which we complete work, and in many ways that’s a good thing. But efficiency isn’t everything. For every task for which we can use AI, we face a tradeoff among several variables: efficiency, the development and maintenance of skills, and many more. The temptation is to prioritize the short-term payoff of efficiency over the other variables, but we believe that in the long run, to do so repeatedly and uncritically is a mistake.

Lessons from Using AI in Grading

There are parallels between how teachers are thinking about AI as grading tools and how we should think about AI in the workplace.

The Wall Street Journal recently published a piece on using generative AI to grade papers. While we’re not educators and can’t speak to the specific tools teachers are using, the article raises questions that are highly relevant to our work in leadership and communication.

Part of what caught our attention in the article was the range of responses from teachers interviewed for the piece. Some were open and excited about the possibilities of using AI for grading — but felt it was taboo to admit using these tools. Others had mixed feelings, recognizing AI’s potential while acknowledging downsides. And some took offense at the mere suggestion of using AI to grade papers, convinced it simply cannot do the work a teacher could do to make students better writers.

This spectrum of perspectives isn’t unique to education. We’ve seen a similar range of views within and across the teams we’ve worked with, and we expect you’d find comparable diversity of opinion inside most organizations. It’s a reminder that as we integrate AI into our work, we need to be prepared for — and respectful of — a wide range of reactions.

The article also raised a fundamental question: What’s the real objective of grading students? At its core, it’s not about assigning a specific grade. It’s about making students better. We take a similar lens in our work. The question isn’t just whether AI can get us to the final output or answer. We need to consider how AI fits into the process of getting to that outcome and how that process helps make people better. We also need to be wary of how it might stagnate development or even lead to erosion of critical skills (as noted in the piece about Matt Beane above). There’s plenty that AI can do to help us improve, save time, and produce better outcomes. But we need to be mindful of — and manage — the potential consequences for skill and talent development.

AI can generate ideas, draft content, analyze data, and tailor messages. But human understanding of complex contexts, nuanced judgment, and strategic decision-making remains a critical part of the process. We’re aiming for a partnership where AI improves our work, while leaving room for human expertise to help navigate subtle cues and ambiguous situations.

As we work though decisions about how to use AI in our work we need to ask ourselves: Are we using AI in ways that truly make our people better? Are we maintaining the right balance between efficiency and skill development? These are the questions that should guide our approach, whether we’re grading papers, crafting corporate communications, or making leadership decisions.

AI Emerges as Doctors’ Secret Weapon Against Insurance Denials

A new treatment for insurance headaches and a signal for change in other process we deal with every day.

Artificial intelligence is making waves in healthcare, but not in the way you might expect. Doctors across the United States are wielding AI chatbots as their newest weapon against a longstanding foe: insurance claim denials, as The New York Times reports. It’s a prime example of how generative AI has novel and emergent use cases that one would not easily predict.

Prior authorizations are a bureaucratic headache for physicians. This process, where insurance companies must approve treatments prior to billing, was designed to keep healthcare costs in check. While well-intentioned, in practice it’s become an increasingly heavy burden on doctors. Enter AI. By leveraging generative AI tools, including HIPAA-compliant platform Doximity, doctors can now draft comprehensive appeal letters in mere seconds and not the hours it has typically taken. And they do so with greater success in successful claims.

This development signals a shift in how professionals can apply AI in professional settings — in this case, automating and improving a necessary, rote process that requires significant labor to create relatively mundane output. Organizations are littered with such routines (including the drafting of FAQ documents, “TLDR” summaries, and standard press releases in corporate communication).

The question isn’t if AI will transform much of the processes imbedded within professional communication, but how soon and how much.

We’ll leave you with something cool (consider it a nerdy beach read): How scientists are using AI to try to understand the language of whales.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

Confluence: AI, Leadership, and Communication

Discussion about this post