Confluence for 6.8.25
Research on the limits of prompt engineering. Claude gets a blog. New Advanced Voice Mode for ChatGPT. College campuses as generative AI laboratories.
Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:
Research on the Limits of Prompt Engineering
Claude Gets a Blog
New Advanced Voice Mode for ChatGPT
College Campuses as AI Laboratories
Research on the Limits of Prompt Engineering
How to approach testing the limits of LLMs in specific domains.
When we’re working with a client and they aren’t satisfied with an LLM’s output, our advice is often to experiment with the prompt to see if they can get better outcomes. This usually helps, but not always. There are times where no matter how carefully or intentionally we prompt a model, we run into the limits of its capabilities. A new paper from Philipp Schoenegger, Cameron Jones, Philip Tetlock, and Barbara Mellers reminds us of this.
Their goal was to see if careful prompt engineering could improve LLM forecasting abilities. This wasn’t a casual experiment. They tested 37 distinct prompts across a range of models, with Claude 3.5 Sonnet being the most advanced (notably, none were reasoning models like o3 or Claude 4). The research team drew prompts from established LLM literature, behavioral science, and forecasting expertise. They tested everything from emotional appeals to sophisticated multi-step reasoning frameworks. They even included prompts generated by OpenAI’s and Anthropic’s automatic prompt generators.
Still, they saw no real improvement in forecasting abilities. Some prompts actually made things worse. Even compound prompts combining successful elements showed minimal gains. Regardless of the results, a lesson stands out.
Models have limits, but you need people with deep expertise and with enough experience and knowledge about LLMs to find them. Philip Tetlock, one of the study’s authors, is a renowned researcher in judgment and the author of the book Superforecasting. If there’s someone who understands what distinguishes the best forecasters in the world from the rest of us, it’s him.
LLM capabilities are emergent. We learn what they can and cannot do through experimentation. But we need people who understand the task at hand to understand where the jagged frontier of generative AI capabilities lies.
We’ve seen this in our own firm and with clients. We find that people with deep expertise in a specific domain who also have real skill in prompting can more completely and accurately assess models’ capabilities because they embed their expertise in how they prompt and test the model. They understand the process and subtasks of the work they’re asking the model to perform (like creating a communication strategy or evaluating a business opportunity), so they can better evaluate the output and improve the prompt.
When you’re exploring the edges of LLM capabilities, don’t do so casually. Bring people with the right domain expertise to the task, equip them with robust guidance on prompting (or have them partner with colleagues who have that expertise), and set them loose. They will find the edges of generative AI capabilities, deepening your team and organization’s understanding of how and when to use this technology, and, likely, expanding their knowledge and skills in the process.
The frontier of generative AI capabilities shifts and expands over time, and mapping it requires the kind of domain expertise that Tetlock and the team brought to forecasting. For our readers, this means your best insights about generative AI’s capabilities in your field will come from your own experts, properly equipped and given time to explore.
Claude Gets a Blog
When should AI be the author?
A few weeks ago, we introduced Claude as a coauthor of Confluence. As we noted then, we wanted to test the capabilities of Claude Opus 4 by having it write an item for us start to finish, unedited. We found the output impressive (and valuable) enough that we’ve done it again and now plan to make it a regular feature. Since then, at least one organization has decided to go in a similar direction, as Anthropic announced they are giving Claude a blog on the Anthropic website.
On its own, this announcement is not major news. We’re noting it here because we believe it could represent a step in an interesting direction. As the writing capabilities of large language models — particularly their ability to mimic the voice of an individual or organization — continue to improve, it would not surprise us to see more written content where AI is the credited author. And while that would have raised red flags a year or two ago, today we think it makes more sense.
We wrote just two months ago about why you still need to write, and we hold that same conviction now. So, what’s the argument for writing where AI does the work and has the byline? For us, the byline is the critical part of the equation. As we’ve noted several times in the last two years, the amount of writing done by AI but with a human byline is growing. Every day we see dozens of LinkedIn posts, emails, and more that fit that bill. This approach is the riskier one if we care about credibility and authenticity. We wrote last month that:
What will separate good communication from great in an increasingly AI-mediated world will be communicative judgment — the ability to make communicative choices that send the message we want to send, including about who we are and the relationship we have (or want to have) with those on the other end of our communication.
…
When we succumb to [blindly outsourcing our writing to AI] it will be obvious, and it will send a message about who we are and our relationship to those on the other end of our communication. The questions will be: What message do we want to send? And what choices do we need to make about our use of these tools that reinforces rather than undermines that message?
It comes down to intention. When we use Claude to write a piece for us, it’s generally to convey information. For our first two Claude-authored posts, it was to summarize a research paper and share some potential implications (steered by direction we gave it). In this week’s edition (see our next item), we gave it a more challenging task. We’re comfortable enough with Claude’s ability to mimic our voice and with Claude’s accuracy to do this, and by experimenting in this way we learn Claude’s limits. By crediting Claude as the author, we make what we’re doing transparent.
We write the rest of Confluence, 75% or more of it, “by hand” with potential editorial assistance from Claude or another AI. In posts like this, we want full control over our message and the nuances we wish to convey. We do not want to outsource that to AI for many reasons, not least of which is that the act of writing allows us to further develop our point of view. In many cases, writing is not the means to an end but is a valuable end of its own.
But not all cases. Sometimes we simply need to share information for instrumental purposes: to summarize efficiently and accurately, to clearly convey procedural or administrative information, and so on. Those are the cases where AI authorship (with clear designation as such) may increasingly make sense.
It will likely take a while to reach anything resembling an equilibrium. The development of the technology will continue to outpace the development of the social norms surrounding its use. One component of that future equilibrium, though, may involve increasing comfort with AI as author, with clear designation as such. That feels more sustainable than the position we’re in today, where so much AI writing is passed off as human product. Time will tell, but in the meantime, we’ll continue our experiment of Claude as coauthor. Speaking of which…
New Advanced Voice Mode for ChatGPT
The ability to talk to your technology continues to advance.
We’ve made Anthropic’s Claude a member of the Confluence writing team, with Claude authoring an item each week. Here’s this week’s missive, and the prompt we used to create it is in the footnotes.1 It was a more challenging test than in prior weeks, as we asked Claude to do the research for the piece online rather than our providing an article or source. Claude seems to have gotten those sources correct, though we may have included fewer “users say” citations. That said, we include it here unedited, although we did insert the video we recorded.
OpenAI released a significant update to ChatGPT’s Advanced Voice Mode on June 7, making conversations sound even more natural and human-like. The update introduces subtler intonation, realistic cadence with pauses and emphasis, and more accurate expressiveness for emotions like empathy and sarcasm. The company also added real-time translation capabilities, allowing users to ask ChatGPT to translate between languages and maintain that translation throughout an entire conversation.2 Early user reactions suggest this update represents a meaningful step toward truly natural AI conversation, with one user noting it “doesn’t just talk, it performs. Feels less like an AI and more like a human friend.” We recorded a brief demo:
The implications for service industries are becoming clearer as voice interactions approach human-level naturalness. Customer service roles, sales conversations, and support functions that rely heavily on voice-to-voice interaction are increasingly viable candidates for AI augmentation or automation. Organizations are already exploring applications in customer service training, financial advisory conversations, and health coaching where emotionally intelligent responses matter. IBM reports that 27% of organizations already use generative AI for spoken customer communication, with 75% planning to implement these solutions by 2025. This isn’t just about replacing human agents but creating new possibilities for training, simulation, and 24/7 availability that weren’t practical before.
The practical applications for knowledge workers extend well beyond customer-facing roles. Advanced Voice Mode can run in the background even when your phone is locked or you’re using other apps, making it ideal for hands-free brainstorming during commutes or walks. The natural conversation flow makes it particularly effective for thinking through complex problems aloud, getting quick answers to questions throughout the workday, or practicing presentations. Users are discovering that speaking directly to the AI feels “much more authentic and responsive” than text-based interactions, opening up new use cases for conversational reflection and real-time thinking partnership that feel fundamentally different from traditional search or text queries.
The steady stream of improvements points to a future where the line between human and AI conversation becomes increasingly indistinguishable. What we’re witnessing isn’t just incremental progress in voice technology but a fundamental shift toward voice-first AI interaction. As these capabilities mature and become more widely available, the assumption that meaningful AI interaction happens primarily through text may prove to be a brief historical moment rather than a permanent feature of how we work with artificial intelligence.
College Campuses as Generative AI Laboratories
And why we think it matters beyond the classroom.
The New York Times published an article this week detailing OpenAI’s aggressive push to embed ChatGPT into every aspect of college life, from orientation through graduation. The piece, which builds on themes we’ve been exploring for years about generative AI adoption in educational settings, reveals just how quickly the landscape is shifting. While the initial panic over cheating and laziness hasn’t fully abated, the conversation has evolved into universities racing to become “AI-native” institutions. This latest development is a clear signal that the fundamental infrastructure of how we learn and work is being rebuilt in real time.
OpenAI’s campus strategy is comprehensive: they’re selling premium ChatGPT Edu services to universities, running billboard campaigns targeting students during finals, and working with schools like Cal State to provide AI access to over 460,000 students. The company envisions a near future where every incoming student receives a personalized AI account alongside their school email, to include custom study bots for each class, AI-powered career coaching, and chatbots that remember your learning journey from freshman year through your first job. Google and Microsoft are scrambling to keep pace, as we discussed in our post from late April about free generative AI services for students.
While there’s much to consider at the surface, we also see this moment as having deeper and broader implications well beyond campus. Whatever unfolds in universities over the next few years may preview how AI adoption spreads through the broader workforce. Here’s a telling data point: when two of our CRA | Admired Leadership® colleagues recently held a session in a college classroom and asked who regularly used AI, 100% of hands shot up immediately. Compare that to corporate settings, where we typically see 50-60% of hands being raised at the same question. That gap tells us something important about where we’re headed.
Universities are becoming laboratories for understanding how humans and generative AI will collaborate in professional contexts. As these students graduate and enter the workforce (likely bringing their AI assistants with them) they’ll reshape workplace expectations and practices. The questions students and educators are wrestling with now about generative AI’s impact on critical thinking, skill development, and human agency are ones every organization will face tomorrow. Those interested in understanding where things are headed should pay attention to what’s happening on campuses today and in the days to come.
We’ll leave you with something cool: Ethan Mollick shows three years of AI progress through 32 AI-generated pictures of otters.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
Project Prompt:
Claude is a writer for Confluence, CRA's weekly newsletter about the intersection of AI, communication, and leadership. Claude reads 5-10 past issues of Confluence at http://craai.substack.com to get a feel for its writing and editorial style. Claude then writes an article for Confluence based on the instructions from its user.
When it writes, Claude honors these rules: Claude uses dashes sparingly, if ever. Claude does not start sentences with the word "however." Claude does not use semicolons.
The user will list Claude as the author for this Confluence item, so Claude gets proper credit. Claude gives it its all!
Chat Prompt:
Hi Claude. This past week OpenAI released an updated Advanced Voice Mode for ChatGPT. We'd like to include a piece about it in Confluence this week. Do some research online to get the facts about the new voice mode, and write the piece. Give me three or four paragraphs. We have a short screen recording of one of us using it that we will insert in the article as a demo. Put that where you wish in the article. The main point we want to make is that the voice modes are getting better and better, and we presume that soon talking to generative AI tools will be as natural as talking to people. This obviously has implications for service industries (service agents) and other roles that require voice-to-voice interaction where automation with LLMs can occur. But it's also a great way to interact with an LLM, either with the voice mode open on your desktop to which you just ask questions through the day, or to learn about topics or have a brainstorming partner when driving or walking (as two mundane examples). This update happened just this week, so be sure not to pull older news items in your research. Ok, go.
This feature, frankly, is amazing. Travel to places where you don’t speak the language will never be the same.