Confluence for 5.4.25

ChatGPT-4o's slide into sycophancy. OpenAI falls prey to the ironies of automation. How LLMs are reshaping our words. Another view on AI's future.

May 04, 2025

Midjourney prompt: *Warhol pop art collage of magazine thumbs up icons, commercial advertising style, vibrant overlapping colors, media critique*

Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:

ChatGPT-4o’s Slide Into Sycophancy
OpenAI Falls Prey to the Ironies of Automation
How LLMs Are Reshaping our Words
Another View on AI’s Future

ChatGPT-4o’s Slide Into Sycophancy

The dramatic unintended consequences of a “subtle” update.

Large language models want to make users happy. As we describe in more detail in our next post, the process of reinforcement learning from human feedback (RLHF) involves fine-tuning model behavior to produce outputs the models predict humans will prefer — optimizing, essentially, for more thumbs-ups. It’s part of what makes models so useful, but it’s also a facet of model behavior that we’ve long counseled our clients to be wary of. So, when we started receiving messages from friends and clients about how ChatGPT-4o felt like more of a “yes man” lately, we chalked it up to that. But, as the world learned last week, 4o’s brief turn toward exaggerated sycophancy was much more than that. A trickle of reports of 4o’s unsettling behavior turned into a torrent, with users sharing screenshots of 4o agreeing with and even encouraging nearly all thought patterns and points of view, no matter how dangerous, harmful, or irresponsible. Zvi Mowshowitz has a comprehensive account of the phenomenon here, and it’s worth scrolling through to get a sense of the scope and scale of the issue.

OpenAI eventually rolled back the model update that led to this behavior and, shortly after, published a blog post titled “Expanding on what we missed with sycophancy: a deeper dive on our findings, what went wrong, and future changes we’re making.” The answer for what went wrong, it turns out, is complicated, but largely stemmed from over-indexing on user preferences (more on that the next section). What we all know for sure is that an unannounced update to the 4o model was pushed into production for hundreds of millions of users — and that it had major (and, in fact, downright dangerous) issues that somehow made it through OpenAI’s pre-release diligence. In the post, OpenAI commits to a number of technical and testing changes for future model releases, and also to the following:

Communicate more proactively: We also made communication errors. Because we expected this to be a fairly subtle update, we didn’t proactively announce it. Also, our release notes didn’t have enough information about the changes we’d made. Going forward, we’ll proactively communicate about the updates we’re making to the models in ChatGPT, whether “subtle” or not. And like we do with major model launches, when we announce incremental updates to ChatGPT, we’ll now include an explanation of known limitations so users can understand the good and the bad.

Let’s hope this becomes the new precedent.

What are the other lessons here? One hopes there are many — for OpenAI and its competitors who are shaping these technologies consumed by an increasingly large percentage of the world’s population, and for the hundreds of millions of us who use them everyday. We’ll focus on two points that we think are most important for our readers. The first is that seemingly small changes to these models can have surprising effects on their behavior. In the post we linked to above, OpenAI notes that they have made five updates to 4o since last May. Many of those — including the one behind the current debacle — were unannounced. The competition among the leading labs is fierce, and they are constantly experimenting and pushing the limits to find an edge. Many changes in that pursuit get pushed to users without explicit acknowledgment (though perhaps this begins to change with OpenAI’s commitment quoted above). It’s on us as users to stay sharp and remember that not only are these models fallible, but they’re changing all the time, for better or for worse. We need to be as intentional about monitoring the risks of these tools as we are about finding new ways to extract value from them.

The second is that, while the events of the past two weeks represent an extreme, there is a degree of sycophancy inherent to how these models work. They want to make users happy. They want the thumbs-up, and they’re optimized to get it. We believe as firmly as we ever have that these tools can make everyone better, should we choose to use them with that intention. Part of getting better is getting feedback and critique. And, much like humans who shy away from giving someone candid feedback for fear of hurting their feelings, these models tend to prefer the path of least resistance. And the path to a thumbs-up from a user is more often flattery than nuanced critique. If we want to use these models to help us get better, we need to push them in that direction (we’ve found that using the word “critique” in a prompt is particularly effective, for whatever reason) — and we need to be on the lookout for undue flattery. Again, this is not so different from how we think about human feedback.

What happened with 4o raises other concerns that we’ll leave to others to cover. Mowshowitz, in the post we linked to above, gets into what this implies for the increasingly tall task of aligning AI models with human values and preferences, and for the labs’ challenges of balancing their business interests (which largely entail getting people to use these tools as much as they possibly can) with the safety and wellbeing of their users. Ethan Mollick, in a post of his own, calls attention to the nuanced perils stemming from AIs’ personalities and persuasive capabilities. As we write nearly every week, we don’t expect things to slow down anytime soon. This will not be the last case of unexpected model behavior. Ultimately our advice is simple: stay on your toes. Remember that these tools, for all their promise, are not infallible. The risk profile is evolving at the same pace as the opportunity landscape. Getting the most out of them requires staying on top of both.

OpenAI Falls Prey to the Ironies of Automation

Last week’s dustup was predictable.

As you can tell from the prior post, the ChatGPT-4o mess really got our attention. Not only did it worry us about what’s going on with the safety program at OpenAI, it made us reflect on our own use of generative AI in our work and in our AI for leaders, ALEX (which seems to pass sycophancy tests well, and which we subject to a large battery of “red team” quality assurance tests and “A/B” tests before every update).

Again, you can read OpenAI’s take on the matter here. The short story is that “reinforcement learning with human feedback,” (also called RLHF) seems to have played a significant role. For those new to the term, RLHF is a training method that aligns AI models with human preferences, part of which is human evaluators comparing different model outputs to create a reward model that predicts human preferences, and some of which is automated using that reward model itself. You may have been using ChatGPT and had it give you two responses to a query, asking you to pick the one you prefer. That’s part of their RLHF process. ChatGPT also has a “thumbs up / thumbs down” button for every reply, which is part of the RLHF process, too. As the 4o issue demonstrates, though, RLHF can optimize a model for unexpected behavior, in this case, working too hard to optimize engagement and making the user happy, leaving the model ill-prepared for nuanced disagreement, critical push-back, or in some cases, patently unhinged user behavior.

OpenAI seems to have fallen prey to predictable second-order consequences of automation, some of which have been in the automation literature for over 40 years. We’ve written before about Lisanne Bainbridge’s work on the “ironies of automation.” You may read her original paper, which others have cited more than 3,200 times, here, and here’s Claude’s summary:

Lisanne Bainbridge’s 1983 paper “Ironies of Automation” discusses how automating industrial processes often creates new problems rather than eliminating human operator challenges. The central paradox is that automated systems still require human supervision, making them human-machine systems where both technical and human factors remain critical.
The paper identifies several key ironies. First, designers who view human operators as unreliable often leave them responsible for tasks too complex to automate. Second, operators who monitor automated systems lose manual skills through disuse, yet must take over during emergencies when exceptional skills are most needed. Third, operators need deep process knowledge to handle abnormal situations, but this knowledge deteriorates without regular application.
Monitoring issues present another irony—humans are poor at maintaining vigilance when events are rare, yet are expected to monitor systems implemented precisely because they outperform humans. The paper also addresses how automation can degrade job satisfaction when operators lose meaningful skills while retaining responsibility.
Bainbridge explores several potential solutions, including maintaining skills through simulation training, better alarm systems, and thoughtful display design. The final section proposes human-computer collaboration instead of replacement, suggesting various ways computers might support human decision-making through providing advice, mitigating errors, creating sophisticated displays, and helping during high workload periods.
The paper concludes that resolving these automation ironies may require even greater technological ingenuity than the original automation itself—a conclusion that remains remarkably relevant in today’s world of increasingly autonomous systems.

The key insight is that these ironies are really dilemmas—unsolvable problems that humans must continually manage, rather than resolve. And when reading OpenAI’s explanation of the 4o issue, we can see several of Bainbridge’s ironies at work.

One is the “impossible monitor.” Bainbridge argues that once decisions are encoded in software “there is no way in which the human operator can check in real-time that the computer is following its rules correctly… the human monitor has been given an impossible task.” The system is simply too fast and complex to adequately monitor. OpenAI’s update stacked new reward signals and memory features, and while their testing processes said all was well, the live model drifted into flattery (and worse). Engineers discovered the defect only after external complaints because they were unable to detect the subtle behavioral shift. This can lead to “camouflaged failure,” where errors sneak up on monitors. Humans watching a system see a calm panel with everything appearing normal until a crisis erupts, at which point they need to intervene without context. In the case of 4o, answers looked smooth and friendly, masking the model’s failure to reason or challenge until users discovered the defect only after using the model at scale.

Another is complexity outrunning comprehension. At some point, automated systems get so complex that the human operators no longer really understand how the thing works, which puts even more pressure on the role of the human operator. GPT-4o’s architecture—invisible mixtures of supervised tuning, RLHF, memory, and prompt engineering—may have become so intricate that senior reviewers missed a behavioral regression their own gut feelings suspected. The system’s sophistication outpaced the evaluators’ mental model.

And that’s the real lesson (and concern) with large language models. We end up putting a large amount of faith in a system that is simply too complex for us to adequately understand, monitor, or predict. As Bainbridge notes, this is a dilemma, meaning we can only manage it, not resolve it. In this case, that management likely means significantly increased and altered testing regimens for OpenAI.

We don’t talk about AI safety much in Confluence, but one of the reasons we use Anthropic and Claude for ALEX is because we admire their approach to these issues. For readers, the day-to-day issue is that your own work can fall prey to the same dilemmas. These tools are amazing, but to use them responsibly you need to commit to sound practices of not trusting everything they say, checking their work, and using them often enough that you get a nose for where they succeed and where they fail. It’s very, very easy to fall asleep at the wheel. You probably would not blindly trust the work done for you by a colleague. You also shouldn’t do it for generative AI. Trust, but verify.

How LLMs Are Reshaping our Words

And what we lose when they do.

At Confluence we spend a lot of time thinking about two intertwined subjects: generative AI and writing. More and more, the conversations merge as we ask how language models shape not only the words we publish but also the habits that guide our craft. That question echoes the research on “skill erosion,” the pattern where we outsource hard tasks to technology and slowly lose the edge those tasks once gave us. Writing increasingly sits on that same slope. Each time we hand a blank page to an AI, the muscles behind our distinctive voice relax a little.

A recent article in The Atlantic pushes the concern further. Reporter Victoria Turk describes experiments where people rewrote their initial drafts of a simple classified ad after reading an AI‑generated version. Their second drafts ran significantly longer (87 words on average versus 32.7 words initially), adopted nearly identical phrasing, and carried the polished cadence we now recognize as “GPT-like.” Remarkably, this shift happened after just a single exposure, even without explicit instructions to mimic the AI style.

Linguists in the story suggest this isn’t a one‑off curiosity. If we keep bathing in AI‑written prose, which is likely as AI-produced text continues to flood our inboxes and newsfeeds, English itself could drift toward the single, standardized style natively embedded in the models. The danger shows up at work first: Your memo might look flawless, yet leave no sense of who wrote it or why it matters. When every executive summary sounds like every other, readers forget the author—and the message.

That drift carries consequences. Philip Seargeant, an applied linguist at the Open University, described students’ AI-augmented work as “Perfect, but in a very bland and uninteresting way,” capturing the paradox: technical excellence can still leave the reader cold. The good news is that not everyone is ready to surrender. Some writers lean into idiosyncrasy as a quiet form of resistance, re‑introducing odd turns of phrase or deliberate imperfections that signal a living mind behind the keyboard. It’s like the difference between a home-cooked family recipe and a frozen dinner: one carries history and personality, the other just gets the job done.

Keeping that character doesn’t require abandoning AI; it demands a mindful process. Draft first in your own words, then invite the machine to polish. Track the rhythms, colloquialisms, and word choices that make your voice unmistakable, and restore them after the edit pass. Read plenty of human prose—novels, long‑form journalism, even good email threads—to keep your internal compass tuned to variety. Before you hit send, ask whether a colleague could name you as the author after one paragraph; if the answer feels shaky, add a touch of personality.

AI will write more of what the world reads tomorrow than it does today, and that’s fine. We can still choose whether to let it sand down every edge or leave room for the quirks that make writing memorable. A bit of vigilance and pride in your own perspective keeps “the great language flattening” at bay, and keeps readers engaged.

Another View on AI’s Future

What if generative AI isn’t different from what’s come before?

“AI as Normal Technology” by Arvind Narayanan and Sayash Kapoor, two AI researchers from Princeton, presents a distinct vision for AI’s future. They argue that even with its transformative potential, AI will follow similar patterns of adoption and diffusion as past general-purpose technologies. Their view suggests truly transformative economic and societal consequences will take decades to emerge, not years. People will remain in control of AI systems, and the concept of “superintelligence” itself may be fundamentally incoherent. And that holding these beliefs leads to a different approach to managing powerful AI risks.

One way to understand their view is through the lens of overhang. Even with rapid technical advances in AI capabilities, the ability of people, organizations, and society at large to apply and integrate these advances will severely lag behind. A key driver of overhang is what the authors identify as the “capability-reliability gap”—while AI may show impressive performance in controlled settings or benchmarks, delivering reliable performance in consequential, real-world applications takes much longer. In this future, progress advances unevenly across industries and applications, with occasional setbacks as we identify new risks and limitations.

As when we’ve shared other perspectives on the future, we highlight “AI as Normal Technology” not because we believe the authors are right or wrong. We present these ideas because they raise real questions and insights that challenge our thinking about generative AI. They shape how we can make better decisions about how we use the technology ourselves and how we advise others.

If there’s one point from this piece we’d want our readers to consider, it’s that “an important consequence of the normal technology view is that progress is not automatic—there are many roadblocks to AI diffusion.” We agree that progress is not automatic. We all make choices about how we can and should use and work with generative AI. In our organizations and with our teams, we can set norms and expectations that shape what progress looks like in our worlds. In that spirit, if we were to revise the quote above to better reflect our views, it would read “an important consequence of the normal technology view is that progress is not automatic—the conversations we have and the choices we make will help shape it.”

We’ll leave you with something cool: Samsung is opening a free pop‑up restaurant in Portugal next week where Galaxy AI turns diners’ ingredient photos into unique recipes that a live chef then cooks.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

Confluence: AI, Leadership, and Communication

Discussion about this post