Confluence for 8.3.25
Alexa+ and the LLM-powered future. New research on AI usage across organizations. Google launches Deep Think for Gemini 2.5 Pro. Simplest prompting is best.

Welcome to Confluence. We have two general items to mention up front today. First is a new podcast of each issue. You can access today’s issue here, and an archive of all future issues via the podcast link of the Confluence main page (and via the major podcast services soon via the Apple and Spotify podcast services if you prefer). We’ll note that the podcast is created entirely with AI via the Google NotebookLM “audio overview” feature. We hope it’s of use to those who prefer to get their weekly generative AI update via ear rather than eye. The second is a new intensive course we’re launching on generative AI for communication professionals. We believe this eight-week virtual course is best-in-class, and is an excellent way to credential oneself in generative AI for communication practice. The course launches September 11, and you may learn more about it here.
With that done, here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:
Alexa+ and the LLM-Powered Future
New Research on AI Usage Across Occupations
Google Launches Deep Think for Gemini 2.5 Pro
Simplest Prompting Is Best
Alexa+ and the LLM-Powered Future
Our experience with the new Amazon assistant validates our predictions.
We’ll fess up to something you probably already suspect — your Confluence team leans to the “nerd” side of the nerd/cool continuum. We know you’re shocked to learn this, but it’s true. This tendency leads us to be early explorers of many technical innovations, Amazon’s forthcoming Alexa+ voice assistant among them.
Alexa isn’t new to many of us. Many homes we enter have Alexa or its competitors from Apple and Google. But Alexa+ is new, and our experiences with it this week suggest a change we’ve been forecasting for two years but which most people don’t see coming. As context, we asked Claude to author a quick summary of Alexa+:
Amazon officially unveiled Alexa+ in February 2025, marking what the company calls a “complete re-architecture” of its decade-old voice assistant using generative AI technology. Powered by large language models available through Amazon Bedrock — including both Amazon’s own Nova models and Anthropic’s Claude — Alexa+ represents a significant leap beyond the basic commands and responses that defined the original Alexa. The service costs $19.99 per month for general users but comes free for Amazon Prime members, positioning it as a new Prime benefit alongside faster delivery and streaming content. Early access has begun with Echo Show 8, 10, 15, and 21 devices, with Amazon promising to roll out to additional users in waves over the coming months.
What sets Alexa+ apart is its agentic capabilities — the ability to navigate the internet and complete multi-step tasks autonomously on behalf of users. Amazon demonstrated how Alexa+ can research and schedule an oven repair service, handling everything from finding a service provider on Thumbtack to authentication and appointment booking, then reporting back when complete. The assistant can also integrate with services like OpenTable for restaurant reservations, Grubhub for food delivery, Ticketmaster for event tickets, and various smart home devices, allowing for complex, multi-service requests in natural language. Beyond task execution, Alexa+ offers enhanced conversational abilities, visual understanding through device cameras, and can parse uploaded documents and emails for future reference — positioning Amazon’s assistant as both a productivity tool and a more capable competitor to ChatGPT and similar AI assistants.
(If you’re trying to identify large-language-model / LLM-authored text, the em dash is a giveaway, at least for now.) We have been toying with Alexa+ for only a few days, but we’re already impressed. It’s very capable, and if it keeps the promise of integrating with things like OpenTable, home devices, etc. the potential is impressive. But there are two larger implications.
First, we’ve been saying since the beginning that (1) LLMs are general purpose technologies, which by their nature, become integrated into many other technologies (like your home voice assistant speaker), and (2) this being the case, the day will come where we talk to our tools and devices, rather than typing / switching / manipulating them. Yes, we’ve been able to do some manner of that with things like Siri and Alexa, but because of LLM’s incredible proficiency with human language, we’re talking about a whole new level of conversational instruction.
Imagine being able to have a conversation with your car, using your spoken voice not just to pick destinations or play a song, but to adjust the air conditioner, to find out if your tire pressure is low, or to have an intelligent conversation about which route to take rather than clicking options on a screen. Or imagine telling your oven in natural language what you want it to do: “Hey oven, pre-heat in convection bake to 425 and set a timer for 30 minutes.” Or (and Alexa+ should be able to do this now), “Hey Alexa, I want the front-hall light to go on whenever our front-door camera detects motion between 11 PM and 6 AM. Set that up for me, please.” In the very near future we will have natural-language conversations with our tools.
The second implication may be more significant, and it’s that Alexa+ may be the first device that brings LLM technology to the masses in a way not embodied in an app, phone, or computer. Estimates are fuzzy, but Amazon has sold at least 500 million Alexa devices, and there are about 250 million Prime members. Regardless, with Alexa+ several hundred million people will suddenly, and without them necessarily knowing it, have one of the world’s most powerful and practically capable LLMs sitting in their home. This is a classic example of a general purpose technology: nobody thinks about the transistor when using a TV or cellphone, they think about the thing they are using. Prime members don’t need to know about LLMs to begin taking advantage of one (and its very capable general intelligence) in Alexa+.
We saw that with our Alexa+ usage this week. Yes, it ordered protein bars for us and turned off the lights for us as it used to. But it also, seamlessly, went from giving us the local weather to telling us a trigonometry formula for calculating field of view for a screen based on screen width and eye-point location (don’t ask). Can your Siri or (old) Alexa do that, or this?
Alexa, write a poem combining the ideas of vitamin supplements and crossword puzzles and email it to me.
Okay.
Title: “Puzzles and Pills”
In days gone by, we’d gather ’round,
With pencils sharp and minds so sound.
To solve the crosswords, clues unfurled,
A simpler time, a slower world.Our vitamins came from the sun,
From fresh-picked fruit and outdoor fun.
No pills to pop, no capsules small,
Just nature’s bounty, free for all.Now supplements line shelves so neat,
While crosswords dance on screens so sleek.
We long for simpler days of yore,
When life seemed richer, somehow more.Yet still we find some comfort true,
In puzzles solved and health renewed.
Though times have changed, some joys remain,
In vitamin and inky game.
People keep asking when LLMs are going to really have their effects. One answer we’ve been giving has been, “When they are in enough hands and people see what they can do.” Given the power of current models, that’s now a distribution problem, not a capability problem. Alexa+ is going to bring a new, large audience into the world of generative AI, without many of them even knowing they’ve entered it. It and tools like it will change how these people use and benefit from technology, and it will change their expectations for all other sorts of interaction with technology.
People keep asking when will LLMs go mainstream. At least for Amazon Prime members with Alexa, they’re about to.
New Research on AI Usage Across Occupations
The findings and the media’s portrayal of them paint two different pictures.
This week, Microsoft published a new research paper titled “Working with AI: Measuring the Occupational Implications of Generative AI”. The paper is noteworthy, as is the media’s response to it, so we enlisted Claude to author two pieces: one on the paper, and one on the media response. Both of Claude’s pieces represent its first output, with two minor edits made by us: adding a definition for an acronym, and adding the link to the Benzinga article. As always, our prompts are in the footnotes.1
Here’s the first piece, which focuses on the paper itself and the implications for leadership and communication:
A team from Microsoft Research has taken a novel approach to understanding AI’s workplace impact by analyzing 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot, a publicly available generative AI system. Rather than relying on predictions or laboratory experiments, this research provides a window into how people are actually using AI tools in their daily work. The findings reveal important patterns that should shape how leaders think about AI adoption and workforce development.
The researchers made a crucial distinction that often gets overlooked in discussions about AI and work. They identified two distinct ways in which a single conversation with an AI assistant can affect the workforce, corresponding to the two parties engaged in conversation. First, there’s the user goal — what the person is trying to accomplish with AI assistance. Second, there's the AI action — what the AI itself is doing in the conversation. This distinction matters because it separates AI augmentation (helping humans do their work better) from AI automation (the AI performing work directly). For example, when someone asks AI for help resolving a computer issue, their goal is technical problem-solving, but the AI’s action is providing technical support — two different work activities that affect different occupations.
The most common work activities people seek AI assistance for involve gathering information and writing, while the most common activities that AI itself is performing are providing information and assistance, writing, teaching, and advising. This asymmetry is striking: in 40% of conversations, the user’s goals and the AI’s actions involve completely different work activities. The AI often acts in service roles — as coach, advisor, or information provider — while users leverage it primarily for knowledge work tasks. The occupations with highest AI applicability scores are knowledge work occupation groups such as computer and mathematical, and office and administrative support, as well as occupations such as sales whose work activities involve providing and communicating information. Interpreters and translators topped the list, with 98% of their work activities overlapping with tasks people frequently use AI for.
The implications for leadership and communication are significant. First, the research confirms what many have suspected: AI adoption is happening now, broadly, and across many occupations — not just in tech roles. Leaders should assume their teams are already using these tools and focus on channeling that usage effectively rather than controlling whether it happens. Second, the distinction between AI assistance and AI performance suggests that most current AI usage enhances rather than replaces human work. The authors note that their data indicate “AI can help users with a broader fraction of their work than it can perform directly.” This should inform how organizations approach AI integration — focusing on augmentation strategies that help employees work better rather than automation strategies that eliminate roles. Finally, the research reveals that AI is particularly effective at information gathering, writing, and communication tasks — core activities in corporate communication work — suggesting that communication professionals should be at the forefront of learning to work effectively with these tools.
And here’s the second, which focuses on the media response:
“Microsoft Paper Warns AI Could Make Historians Extinct But Leave Embalmers Alone,” screams one recent headline. If you’ve been following coverage of Microsoft’s new research on AI and jobs, you might think scientists have definitively identified which occupations are doomed and which are safe from the robot apocalypse. The research, which analyzed 200,000 anonymized Bing Copilot conversations, is real and substantive — but the headlines fundamentally misrepresent what it actually found. As the researchers themselves warn: “It is tempting to conclude that occupations that have high overlap with activities AI performs will be automated and thus experience job or wage loss... This would be a mistake, as our data do not include the downstream business impacts of new technology, which are very hard to predict and often counterintuitive.”
The study’s key insight is the distinction between what users are trying to accomplish (user goals) and what the AI is doing (AI actions). This matters because they’re often different. When someone asks AI to help troubleshoot a computer issue, their goal is to fix their computer, but the AI’s action is providing technical support. This distinction reveals something the breathless headlines miss: in 40% of conversations, what the user is trying to do and what the AI is doing are completely different activities. The user goal and AI action IWAs [editor’s note: an IWA is an “intermediate work activity,” a unit used to classify different types of work] again shows that humans are using AI to gather, process, and disseminate information while the AI is helping by gathering, explaining, and communicating information to the user. This suggests AI is primarily playing a supporting role — acting as coach, advisor, or information provider — rather than directly replacing human work.
What the data actually shows is that information gathering, writing, and communicating with others are the most common user goals in Copilot conversations. Knowledge workers and those in communication-focused roles see the highest “AI applicability scores,” with occupations like Interpreters and Translators, Writers and Authors, and Customer Service Representatives showing significant overlap with AI capabilities. But here’s the crucial part the headlines miss: even for these highly-affected occupations, the task completion rate is not 100% and the scope of impact is usually moderate. The research found that AI typically assists with a portion of work activities rather than replacing them wholesale. Physical occupations — from nursing assistants to truck operators — show minimal AI impact, which shouldn’t surprise anyone who’s tried to get ChatGPT to fix a leaky faucet.
Perhaps most importantly, the researchers themselves emphasize that their data captures AI usage, not AI impact. They explicitly state they “have no data on the downstream impacts of that usage.” Whether AI makes workers more productive, leads to hiring more people to handle increased output, or results in workforce reductions isn’t something usage data can tell us. The study notes that if AI makes software developers 50% more productive, companies could raise their ambitions and hire more developers as they are now getting more output per developer, or hire fewer developers because they can get the same amount done with fewer of them. Those claiming this research shows which jobs will be “replaced” are making a leap the data simply doesn’t support. What we’re seeing is a technology being widely adopted as a productivity tool, with its ultimate impact on employment still very much an open question — one that will be answered by business decisions and societal choices, not by the technology itself.
Google Launches Deep Think for Gemini 2.5 Pro
It’s very smart, and a reminder on how to think about the capabilities of frontier models.
Google has released “Deep Think” for Gemini 2.5 Pro. This is a variation of the model that achieved gold-medal standard at the International Math Olympiad and brings similar capabilities (though not the same) to those who pay $250 per month for Google’s AI Ultra plan. While benchmarks don’t tell the whole story about the latest models, they offer a quick way to compare models at similar capability levels. Deep Think performs quite well based on what Google published, placing it at the top of most leaderboards.
Deep Think works much the same way as o3 Pro. It is not a fundamentally different model than Gemini 2.5 Pro. Instead, when you have “Deep Think” turned on, the model spends more time working on whatever problem you give it. Google’s description provides insight into how Deep Think approaches complex problem solving.
Just as people tackle complex problems by taking the time to explore different angles, weigh potential solutions, and refine a final answer, Deep Think pushes the frontier of thinking capabilities by using parallel thinking techniques. This approach lets Gemini generate many ideas at once and consider them simultaneously, even revising or combining different ideas over time, before arriving at the best answer.
Moreover, by extending the inference time or “thinking time,” we give Gemini more time to explore different hypotheses, and arrive at creative solutions to complex problems.
We’ve also developed novel reinforcement learning techniques that encourage the model to make use of these extended reasoning paths, thus enabling Deep Think to become a better, more intuitive problem-solver over time.
Deep Think again demonstrates how leading labs continue to converge on similar approaches for their flagship models. Whether you’re working with Anthropic’s Claude, OpenAI’s models, or Google’s offerings, you’ll find comparable capabilities at the frontier. The corollary is equally important. When working with older models or those from other labs, you may find them insufficient for real work.
Deep Think also offers an important reminder about the nature of today’s leading models. The more advanced the model, the more you need to push it to find its true capabilities. At this point, any of the leading models handle simple requests with ease. You may find, as we have, a preference for one model or family of models over the others, but if we were to wake up tomorrow and find we could no longer use Claude, we would still find tremendous value working with OpenAI or Google’s leading models.
Truly getting the most out of the leading models requires real work. One of our Confluence writers recently spent over 90 minutes building a single Claude Project for an urgent announcement. Ninety minutes represents significant time investment. Some may be tempted to call that excessive and simply write deliverables document by document. Yet that would miss the opportunity. A project (or CustomGPT or Gem) created by someone with deep expertise in both communication and generative AI can save hours or even days of work. The upfront time investment pays dividends.
Looking ahead, many expect GPT-5 to arrive in August. But Deep Think reminds us that understanding what makes any frontier model truly better requires going beyond simple prompts and surface-level testing. We’ll need to invest real effort in exploring these new capabilities through complex, substantive tasks to find entirely new ways of working with this technology.
Simplest Prompting Is Best
Why emotional appeals in your prompts may not be worth your time.
A new paper from the University of Pennsylvania examines whether significant differences were observed in results when using either threats or positive reinforcement to prompt frontier models. The study from Ethan Mollick and his colleagues shows no significant impact on outcome when people include threats (“I will kick a puppy if you get this wrong”) or offers intended to be enticing (“I’ll tip you a trillion dollars if you answer this question correctly”).
We’ve regularly written about related topics, such as the strange relationship between prompt length and quality (we found that more context doesn’t always mean better output), and, more recently, we’ve shared guidance about how to give context appropriately to maximize the efficiency of your prompting. This recent research, the third mini-study in a series run by this team, confirms what we already knew to be true: Increasingly, when it comes to prompting, simplest is best.
It’s worth noting that in the study there was sometimes a positive impact on outcome for individual questions, but when taken in aggregate the results were not significant. During sessions we’ve run with clients and in conversations across our firm, we’ve often discussed the many creative ways you can prompt an AI to get the result that you’re looking for — things like asking the LLM to “rip your draft apart.” This may still be effective for many and, in our estimation, anything that gets people using AI is a good thing. But this latest research is a good reminder that there’s no need to over complicate or render your prompts overwrought with emotion either.
What’s notable about this research is how it fits into a broader pattern we’ve been tracking. As these models become more capable, they seem to require less hand-holding. The takeaways are immediately applicable: teams should focus on the fundamentals — clear standards, relevant examples, and appropriate situational details. The models are getting better at understanding intent from plain language. The best return on the investment of our time in these tools may well be regular, purposeful use that builds practical understanding rather than elaborate prompting.
We’ll leave you with something cool: Google DeepMind’s AlphaEarth aims to “help map our planet in unprecedented detail.”
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
Claude is one of the authors for our weekly newsletter on generative AI, Confluence. In this chat it is writing a Confluence piece on assignment from me, one of the Confluence editors.
The assignment may take many forms. It may be to read an article or other piece and write about it. It may be to research a topic and write about it. It may be to editorialize. I'll give Claude context and direction with each assignment, along with any resources that should inform its work, direction on length in paragraphs, and any angle we wish to take.
Before starting to plan and write its piece, Claude first uses its web search tool to visit Confluence at http://craai.substack.com and read several issues to get the writing style.
Claude is careful to fact-check anything it writes. Hallucinations and incorrect facts damage the credibility of Confluence.
We will give Claude a byline for its work, and thousands of people will read its work, so it brings its best accuracy and quality to the role.
[For the first article] Today, Claude is writing a Confluence post about the attached research paper. The post should summarize the methodology and highlight the key implications and takeaways for leadership and communication.
[For the second article] This week, Claude is writing a Confluence post about the research paper attached here. As part of the post, Claude should address the accuracy of the headlines of articles addressing the paper, which largely claim that the paper “shows the jobs AI will replace and the jobs that are safe from AI.”