Confluence for 8.17.25
More thoughts on GPT-5. Organizational overhang. You should be using Gemini Deep Research. New research on generative AI and the labor market. The AI productivity paradox.
Welcome to Confluence. There was one interesting update in the world of generative AI this past week that we will note here: Claude can now search past chats. One asks it to do so, and we’ve already found it very helpful, especially in Projects where we’ve been working with Claude over many chats and can now say, “Remember when we worked on X? Let’s take that up again here” or “We did work on Y. Use that as context for this conversation.” Learn more about this new ability here. That said, here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:
More Thoughts on GPT-5
Organizational Overhang
You Should Be Using Gemini Deep Research
New Research on Generative AI and the Labor Market
The AI Productivity Paradox
More Thoughts on GPT-5
“How good is it?” is actually a complicated question.
For the past week we’ve been putting OpenAI’s latest model (which is actually a suite of models) through its paces. This includes the “Pro” model, which is of limited availability to most users. And after a week, we’re not quite sure how good it really is. One of your authors recently noted to his colleagues, “I do not like GPT-5 at all” and “GPT-5 Pro is very smart” (which is “HOLY COW!!” in our terminology) within an hour of each other. Both are true, and that’s the issue.
The key thing to appreciate is that GPT-5 is not one model, it’s several, using a clever “routing” technique that by default picks which model to use based on your query. These models range from fast / cheap / not as bright to slow / expensive / very smart. There are three:
“Fast,” which is very fast, surprisingly fast, “did it really do that?” fast. We’ve had GPT-5 Fast read and digest dozens of webpages in seconds.
“Thinking,” which is a reasoning model that thinks through its actions before responding, and which OpenAI says “thinks longer for better answers.”
“Pro,” which is in limited access, which OpenAI says provides “research grade intelligence.” It is slow, doing in five minutes what Thinking may do in one minute and what Fast may do in seconds.
From a practical standpoint this makes a lot of sense. Different tools for different problems, and a router that decides for you makes selection automatic, which is helpful for users who don’t appreciate the differences or subtleties of the models.
But that’s also our frustration. In “Auto” mode we too often are surprised with what we get, usually on the side of getting less thinking rather than more. And that wouldn’t matter much if the models didn’t vary so widely in their intelligence. But they do, as this chart from the ARC-AGI Leaderboard illustrates, comparing model scores on a difficult intelligence testing regimen against their cost (which correlates with speed):
There are more OpenAI models on this chart than just GPT-5, but you can see the wide range for the GPT-5 family, from Mini with minimal thinking down at the bottom left, to the full model with high thinking (which we are guessing is Pro?) up at the top right. And there are a lot of GPT-5 variants, more than just the three on the model selection tool online. The issue is you don’t know what you’re going to get, so we’ve had a large range in experiences using GPT-5 as a result, from the “boy is this thing dumb” to the “holy cow this thing is a genius.” While you can select between Auto / Fast / Thinking / Pro in the model picker, we still wonder which model or models are running behind the scenes among these many variants.
That said, it’s a frontier model nonetheless, likely the most powerful in the world. Note how it compares to GPT-4, which is in blue on the chart. There’s no comparison at all to what most of OpenAI’s 700 million (and Microsoft Copilots’ hundreds of millions more) were using 10 days ago.
So we will keep kicking the tires. When it gets selection right, GPT-5 in Auto mode is an amazing experience, creating huge volumes of valuable output without effort. But be ready for that to not always be the case.
Organizational Overhang
Whatever the future holds, there’s a lot of catching up to do.
As we note above, the jury is still out on how much of a step forward in capabilities GPT-5 is. While the answer is likely complicated, the past week has nevertheless seen rampant speculation about whether we should expect a slowdown in progress at the frontier. This includes Cal Newport’s New Yorker piece that asks “What If A.I. Doesn’t Get Much Better Than This?”
It’s an important question, and the answer will have major economic and societal implications. We won’t speculate on whether developments have hit a wall or not (though, in general, we do expect advances – however incremental – to continue). But if we take the question at face value and answer it from our perspective based on integrating this technology into our own firm and working with clients to do the same over the past two years, our answer is simple: even if progress stalls, we all still have a lot of catching up to do.
In our view, generative AI technology as it stands now is sufficiently capable to transform the nature of corporate communication and many other areas of knowledge work. Most teams are barely scratching the surface of what it can do right now. The advances over the past two years have outpaced most individuals’, teams’, and organizations’ ability to adapt to them.
Claude Opus 4.1 defines “overhang” as
the gap between AI systems' latent capabilities and what users actually extract from them – essentially, the unrealized potential that exists because people haven't yet learned how to fully utilize what these models can already do. It suggests that significant improvements in AI applications could come simply from better prompting, tool use, and implementation rather than requiring fundamentally more capable models.
That gap is real and has only increased as generative AI models and products have improved since the November 2022 release of ChatGPT.
At this point, most teams and organizations would be better served by focusing on closing this gap than on speculating about future developments. If there indeed is a slowdown in progress – and who knows whether that will be the case – now would be a great time to catch up. Our advice is to focus on developing employees’ general understanding of and proficiency with this technology. It’s already enough to increase productivity, elevate the quality of a team’s work, and increase a team’s talent density by making everyone smarter and more adept at their work. Establishing this foundational understanding and proficiency now will not only help teams close the gap between what’s possible with this technology and what we’re getting out of it, but will also prepare teams to more quickly adapt to future developments, whatever those may end up being.
You Should Be Using Gemini Deep Research
We find it produces the best reports of leading models.
ChatGPT, Claude, and Google Gemini all now offer a version of a “deep research” tool. Unlike standard chat, these tools use generative AI to conduct thorough research of a query. They develop a detailed research approach, use their web search tool to find and read dozens or even many hundreds of sources, and then compile lengthy reports summarizing their findings.
ChatGPT was the first to offer this ability, and if it has a weakness we find it’s length over value, with reports often running 60-70 pages and not always being without filler or repetition. Claude also does deep research, and we find it favors very thorough background work, sometimes reading and citing 500-700 items, and that its reports are shorter, sometimes erring too far on the side of brevity for our taste.
Many people, though, have not used Google Gemini’s deep research tool, and in our recent experience, that’s probably a miss. We find it a very nice blend of ChatGPT’s detail in output and Claude’s extensive sourcing. We also really like its voice, which is direct and low-fluff but still friendly. It shines, though, in the application of its native intelligence to what it finds. Gemini draws conclusions and recommendations from its research that we find more insightful and practical than ChatGPT or Claude. We’ve gotten reports with direction we can immediately take, and which we intuitively appreciate as smart choices.
As with any application of a large language model, the prompt can have a large effect on what you get. That said, for your next “I really want to know more about topic X” question, try Gemini Deep Research.
New Research on Generative AI and the Labor Market
Less is happening than the discourse indicates.
Every few weeks another report attempts to divine generative AI’s effects on the labor market. Each searches for minor movements in the data and draws larger conclusions. These conclusions matter because they shape the mainstream narrative, as we’ve written about in the past.
This week, the Economic Innovation Group released a report “AI and Jobs: The Final Word (Until the Next One)” that tells a different story. While other reports and commentary seem to search for early signs of an employment collapse related to generative AI, EIG finds no evidence for this (at least, not yet). And they looked hard. They analyzed employment data across roles with variable exposure to generative AI, examining everything from unemployment rates to occupational switching to organization-level hiring patterns.
The data reveal something surprising to those who follow the latest commentary: unemployment rates are actually rising faster for workers least exposed to AI than for those most exposed. Those whose work is most exposed aren’t fleeing to other occupations. They’re not exiting the labor force. Industries employing the most AI-exposed workers continue to grow. Even among recent college graduates, where anecdotal evidence has suggested particular vulnerability, unemployment patterns look the same regardless of how exposed the work is to generative AI.
When we talk about jobs being “exposed” to generative AI, we mean something specific. Exposure doesn’t mean an autonomous agent will start doing that job tomorrow. Rather, it means generative AI will affect how specific tasks get done. A financial examiner exposed to AI still examines finances, but they use different tools, different processes, and likely allocate their time differently. Simply put, exposure does not equal replacement.
This matters because we will continue to hear bold claims about generative AI’s impact on the labor market and broader economy, but we’re still in early days. We advise approaching these assertions with appropriate skepticism. Rather than trying to predict exactly what will happen (a fool’s errand given the pace of change), focus on what we can observe right now: the actual patterns emerging in your organization, how work evolves task by task and role by role. The future remains uncertain, but the present offers plenty of signal if you know where to look.
The AI Productivity Paradox: Why Building Beats Buying
McKinsey’s new data reinforces the importance of strategic generative AI implementation
This week, the New York Times explored the disconnect in enterprise AI adoption. Drawing on McKinsey research, the Times revealed that while nearly 80% of companies use gen AI, just as many report no bottom-line impact. Meanwhile, 42% are abandoning their AI pilots, up from 17% last year. McKinsey calls this the “gen AI paradox,” but we suspect it’s more predictable pattern than paradox.
The problem, McKinsey finds, lies in the gap between horizontal and vertical uses of the tool. Full-access ChatGPT licenses spread quickly because they’re easy to activate, but their benefits remain too diffuse to move the needle. McKinsey’s report claims that “higher-impact vertical, or function-specific, use cases seldom make it out of the pilot phase because of technical, organizational, data, and cultural barriers.” Why? Likely because organizations let AI initiatives bubble up from individual functions without strategic coordination.
The Times points to JPMorgan Chase as one example of what strategic focus can look like. The bank reports shutting down “probably hundreds” of AI projects, but sees this as smart, not wasteful. Meanwhile, they share that 200,000 employees now have AI assistants saving them four hours weekly, and wealth advisers use specialized AI that nearly doubles their recommendation speed. The difference? They’re reimagining entire processes, not just automating tasks within them.
This pattern of unfulfilled promises extends beyond individual companies to the broader economy. As our piece above on the labor market shows, the widespread job displacement predictions haven’t materialized. Rather, unemployment is actually rising faster for workers least exposed to AI. This reinforces the same crucial point that we see in the Times piece: successful AI transformation takes time and precision. Organizations must stop asking “What are all the many places we can insert AI?” and start asking “What would this task or function look like if reimagined with AI at the core?”
The productivity paradox isn’t evidence that AI has failed. It’s more likely to be evidence that most organizations have failed to commit in a productive way. In a landscape where corporate AI investments will reach $61.9 billion this year according to tech research firm IDC, companies are running out of time and budget to get this right. McKinsey sees AI agents, or autonomous systems that can execute complex workflows, as the path forward, but even these require the strategic commitment most organizations currently lack. The first step toward solving this expensive problem is asking the right questions to identify the most strategic areas for AI implementation—whether using agents or otherwise. Companies that succeed will be those that build strategic AI implementations tailored to their specific processes, not those that simply buy off-the-shelf solutions and hope for transformation.
We’ll leave you with something cool: Google released Gemini Storybook — a way to create illustrated and narrated stories from simple prompts.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.