Confluence for 2.2.2025
Human factors are getting more attention. More evidence that LLMs exceed human creativity. Another free reasoning model. An updated guide to which AI to use. Gemini gets an image generation update.

Welcome to Confluence. Before we get into what has our attention this week, we want to let readers know that, on February 18 and 19, we are leading our next seminar on generative AI for communication professionals. We’ve led a number of these in the past 18 months, and we cover the fundamentals of how the models work, and how you can use them for editorial assistance, content generation, analysis, thought partnership, and more. The seminar is virtual, and we do have a few spaces open. If you would like to learn more or join us, you may do so here.
With that said, what has our attention this week at the intersection of generative AI and corporate communication:
Human Factors Are Getting More Attention
More Evidence that Large Language Models Exceed Human Creativity
Another Free Reasoning Model
An Updated Guide to Which AI to Use in 2025
Gemini Gets an Image Generation Update
Human Factors Are Getting More Attention
A recent podcast raises dilemmas that we’ll all need to manage.
In our work with clients regarding generative AI, we always make a point to consider what we call the “human factors” — the subtle but critical implications of AI adoption on skill and talent development. These discussions have taken on new urgency given the increasingly confident assertions from AI labs that we’re approaching models that will be better than “almost all humans at almost everything.”
The topic surfaced last week on Derek Thompson’s Plain English podcast, where New York Times technology columnist Kevin Roose articulated several dilemmas we’ve been discussing with clients for more than a year. The relevant discussion starts about 25 minutes into the podcast if you want to give it a listen. With that said, there are three points that Roose raises that align with how we’ve been speaking and working with clients.
First, there’s the risk to skill development when using generative AI. Every profession has foundational tasks typically handled by junior talent that, while sometimes tedious, build essential capabilities needed for more senior roles. Consider the communication professional who learns messaging strategy by writing dozens of press releases, or the financial analyst who develops market understanding by building countless spreadsheet models. Today’s AI models — particularly advanced ones like Claude 3.5 Sonnet — can already execute many of these tasks at an equal if not higher level of quality than most junior talent. While this may feel further out to some because their organizations don’t yet have access to these tools within their firewalls, they exist and they are going to come inside nearly every organization in the near future.
The second challenge centers on skill degradation. Not every capability is like riding a bike — some skills deteriorate without regular practice. Writing stands out as a prime example. While we leverage AI for editing and proofreading Confluence — which can actually improve our writing — we deliberately choose not to outsource our writing process wholesale. The act of writing itself, of wrestling with ideas and their expression, keeps our skills sharp and our thinking clear. There are times when outsourcing a task to generative AI might save you time in the short term, while diminishing your capabilities over the long term. For some skills that might be okay — once the automobile became widely available, it didn’t really matter that most people could no longer skillfully ride horses — but we need to be mindful of the skills we want and need to maintain.
The third point Roose raises — and the one that will be the most challenging to manage — involves succession planning. If organizations simply replace junior-level work with generative AI and stop hiring for those positions, who will develop into tomorrow’s senior leaders? For the junior talent that we do hire, how do we make sure we select them for the skills they’ll need to lead teams in the future while giving them opportunities to hone those skills in the short term? These aren’t questions we can answer quickly, but we’re already wrestling with them in our own firm and with our clients.
We’d add one crucial observation to these three points: these are dilemmas to manage, not problems to solve. There’s no universal answer that will magically eliminate these challenges. As AI labs roll out increasingly capable models and more organizations bring these tools inside their firewalls, leaders will need to regularly reassess and adjust their approach.
These challenges are crucial, and, frankly, we don’t believe they’ve received enough attention to date. The Thompson podcast marks one of the first times we’ve seen these issues addressed in a major public forum, and we expect these questions to become more and more prevalent in the discourse surrounding generative AI. Until then, we’ll keep raising them.
More Evidence that Large Language Models Exceed Human Creativity
But the real story may be about pattern recognition.
A preprint study just published from the University of Lausanne caught our attention this week: Creative and Strategic Capabilities of Generative AI:
Evidence from Large-Scale Experiments.1 The headline finding is straightforward: When participants were asked to tackle creative challenges — imagining future cities, devising novel inventions — ChatGPT-4 consistently outperformed humans. Not marginally, but significantly. Human raters considered the AI more creative across dimensions of originality, surprise, and usefulness. Most telling: ChatGPT’s outputs represented nearly half of all responses rated in the top 1% for creativity. And this is ChatGPT-4, which is no longer considered a frontier model. Before we get too carried away with that finding, we should note that creativity doesn’t necessarily equate to effectiveness. But still.
Two other findings also caught our attention. The first was that when faced with strategic challenges requiring adaptation to opponent behavior in a simple game, humans outperformed the AI in strategic thinking. The second was about gender differences. Women showed decreased creative performance when told they were competing against AI, while men’s performance remained stable. We’re not sure what that might mean, but we’ve written before about other research that shows gender differences related to generative AI use, and the topic raises important questions about how organizations might introduce and frame AI tools.
Overall, the study is another set of evidence about where these tools can create real value. They’re remarkably good at recognizing patterns, and their creativity seems to stem from an unprecedented ability to identify and recombine patterns across vast amounts of data. And while they can see patterns at immense scale, they can struggle with the kind of adaptive reasoning that often defines strategic thinking (at least for now, as we have yet to really see how the new reasoning models like o1 and o3-mini perform in this space). That said, we work in a world full of patterns. While with large language models the patterns we most often think of involve text, increasingly they also include images (artistic styles they can invoke and videos they can create). There will be more, in medicine (diagnosis, radiology), law (contracts), customer behavior, market trends, communication patterns, and protein synthesis, to name a few (and you can read 50 more in the footnote).2 That’s the real finding here.
Another Free Reasoning Model
And this time, it’s available in ChatGPT.
In last week’s Confluence, we noted that DeepSeek-R1 represented the first free, frontier-level reasoning model. Well, it’s no longer the only one. This past Friday, OpenAI released o3-mini — “the newest, most cost-efficient model in [OpenAI’s] reasoning series” — to ChatGPT. In a bit of a surprise, OpenAI announced that o3-mini will be available on ChatGPT’s free tier as well, which “marks the first time a reasoning model has been made available to free users in ChatGPT.”
In our limited experimentation with the model so far, we’ve noticed that — compared to its predecessor, o1-mini — o3-mini feels like a step up, in terms of both speed and intelligence. It’s also quite clearly not as powerful as o1 Pro, which we wrote about on December 22. The release of o3-mini, including to free users, is perhaps more notable as a harbinger of what’s coming than for what it represents in its own right. In our December 22 post on o1 Pro, we wrote the following:
Most users won’t pay $200 a month for o1 Pro. Compared to an intern’s salary, however, it’s a bargain. But the price isn’t really the point. The point is that the technology behind o1 Pro is coming soon to models more widely available at lower price points (be those models from OpenAI, Anthropic, Google, or others). The state of the art comes quickly in this field. Just as Stable Diffusion’s video capability was novel 12 months ago but is now more broadly and cheaply available, we can expect the same of the reasoning and ability of o1 Pro — with o3 (and whatever the frontier models look like after that) soon to follow.
We’re one month into 2025, and that’s exactly how things are playing out so far. The state of the art comes quickly indeed. DeepSeek’s entry into the competition will likely continue to accelerate development and release cycles. As the frontier advances, what was once the frontier becomes more widely available at lower price points. That o3-mini is available for free is just one example. We expect there will be much more to come in the weeks and months ahead — including the full release of the o3 and o3 Pro models, which will likely take o1 Pro’s place on the leading edge of the frontier and catalyze yet another round of this cycle.
An Updated Guide to Which AI to Use in 2025
Ethan Mollick’s latest guide to AI tools reflects today’s landscape.
Just over a year after sharing his original guide to AI tools, Ethan Mollick has updated his recommendations to reflect the latest landscape. We often find ourselves referring to Mollick’s work because he consistently provides clear, thoughtful analysis of the AI landscape. His latest piece, “Which AI to Use Now: An Updated Opinionated Guide,” offers valuable insights for anyone trying to understand the current state of large language models. If you’re working to make sense of the various AI options available today, it’s worthwhile reading.
Despite a seemingly ever-growing playing field, there are now three primary choices for most users: Claude from Anthropic, Google’s Gemini, and, of course, OpenAI’s ChatGPT. Each has distinct advantages — ChatGPT offers the most comprehensive feature set including an excellent Live Mode, Gemini provides strong search integration and superior image/video capabilities, and Claude (specifically Claude 3.5 Sonnet) stands out for its insightful responses despite having fewer features than its competitors.
For what it’s worth, we’ve found ourselves defaulting to Claude Sonnet. Though we wrote about DeepSeek last week (and discussed other reasoning models above) and continue to explore ChatGPT’s o1 Pro, Claude has been our go-to for the majority of use cases for some time now. In fact, it’s the foundation for our executive coach AI, ALEX, a choice that reflects our confidence in its capabilities.
That said, the most important thing is to choose the AI that consistently delivers the best results for your specific needs. Staying informed about new developments in the field is valuable, too — and we’ll continue keeping you updated as the landscape evolves. The pace of change in AI is remarkable, but don’t let that paralyze you. As Mollick emphasizes and as we’ve written before, the secret isn’t waiting for the perfect AI — it’s diving in, experimenting, and discovering what works best for you right now.
Gemini gets an image generation upgrade.
Frontier image generation just got more accessible.
This week, Google updated the image generation capabilities within Gemini to the latest version of its image generation model, Imagen 3. While we don’t spend much time working with Gemini, this update caught our attention for a few reasons.
First, Imagen 3 is the best image generation model we’ve seen outside of Midjourney. As a point of comparison, here is what Gemini created when given the same prompt that we used with Midjourney to create today’s cover image. In the interest of transparency, we did ask Gemini to create four different versions of this image, just as Midjourney would.

Quite different, but interestingly we see it as being truer to the style of Amelia Peláez than what Midjourney created3. Regardless of which version you prefer, the output from Imagen 3 is at the very least in the same class as Midjourney.
What makes Imagen 3 notable, beyond the quality of the images it creates, is how it’s embedded within Gemini. There’s no need to go to a separate website or Discord channel. Simply open Gemini and ask it to “Create an image of …” and it will do so. And you do not even need a Gemini Advanced account to do so — this functionality is available to all users, for free.
This is just another example of how frontier, or near frontier, generative AI capabilities become more and more accessible over time. Just last summer, frontier image generation required “chatting” with Midjourney through Discord and a $20 per month subscription. Now you can get something that is, at the very least, near the frontier for free through the more familiar chatbot interface.
If you haven’t spent time with Midjourney, we suggest putting Gemini and Imagen 3 through their paces. If we’re going to make smart decisions about how and when to use this technology, we need to stay current with what they’re capable of doing.
We’ll leave you with something cool: Benjamin Breen on how he uses generative AI for historical research.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.We asked our Claude Executive Summary Project to summarize the paper:
Summary
A University of Lausanne study examined generative AI capabilities through experiments with 4,000 participants. The research compared AI chatbots ChatGPT and Bard with human performance on creative and strategic tasks.
ChatGPT surpassed average human performance in creativity tests. The system generated responses that raters scored higher than human submissions on originality, surprise, and usefulness. Human performance improved with AI assistance but fell short of ChatGPT's standalone capabilities.
Competition from AI affected participants by gender. Female participants showed decreased creative performance when competing against AI. Male participants maintained consistent output levels, matching previous studies on gender differences in competitive environments.
Raters failed to identify AI-generated content, correctly classifying human responses 63% of the time and ChatGPT responses 61% of the time. When raters believed content came from AI, they assigned lower scores regardless of source.
In strategic reasoning tests using Rock-Paper-Scissors, both humans and ChatGPT adapted play against biased opponents. Humans outperformed ChatGPT by employing dominant strategies.
The research team emphasized implications for workplace AI integration, suggesting organizations need targeted training to ensure AI tools benefit employees equitably. While AI shows promise in creative and strategic tasks, human-AI collaboration requires development to maximize potential.
Critique
The study combines strengths and limitations.
Strengths include large sample size, clear metrics, and controlled conditions. The methodology combines quantitative scoring with semantic analysis to produce reliable data about AI capabilities.
Weaknesses stem from reliance on subjective creativity ratings. While raters showed consistency, personal bias affects creative assessment. The strategic test uses a simple game, limiting insight into complex decision-making.
The methodology misses real-world conditions. Time pressure, stakes, and workplace dynamics could change results. The study cannot account for AI's rapid development - findings might not reflect current capabilities.
Interpretations, Observations, and Inferences
AI creates ideas but lacks understanding of why ideas work. The data suggests pattern recognition drives AI creativity rather than innovation.
The gender response to AI competition mirrors workplace dynamics. Women reduced creative output when competing with AI - a pattern seen in other competitive environments, raising concerns about AI adoption increasing workplace disparities.
AI strategic reasoning remains basic. Though ChatGPT adapted game strategy, it missed dominant moves humans found. This gap indicates AI excels at pattern matching but falls short on strategic thinking.
Human bias against AI content creates barriers to adoption. Raters scored suspected AI content lower regardless of source. This prejudice could slow workplace AI integration despite benefits.
We asked Open AI’s o3-mini-high model “LLMs are clearly good at pattern recognition and replication. Give me a list of patterns that existing in human enterprise and professions. Contracts is one. Radiology scans is another. Give me 50.” Here’s the list:
Legal Contracts: Standardized clauses and boilerplate language that somehow still leave room for dispute.
Radiology Scans: Repeating imaging patterns that tell a story no human could describe in a single sentence.
Financial Statements: Predictable formats balancing assets, liabilities, and the occasional “creative accounting” twist.
Software Design Patterns: Recurring solutions like the Singleton or Observer that, despite their fame, still confuse beginners.
Project Management Workflows: Gantt charts, sprints, and Kanban boards that chart progress like a well-oiled machine.
Business Process Maps: Flowcharts that visually scream “we’ve done this before” at every decision node.
Legal Case Precedents: Patterns of reasoning in judicial decisions that make law look a bit like a series of well-choreographed dances.
Scientific Research Methods: The timeless progression of hypothesis, experiment, observation, and – sometimes – revision.
Educational Curricula: Structured progressions of knowledge that mirror the “step-by-step” nature of building expertise.
Marketing Campaigns: Repeated themes and messaging blueprints that try to predict consumer behavior (spoiler: it’s never 100%).
Quality Assurance Protocols: Checklists and routines ensuring every detail is in its place—because even the best systems need a QA pass.
Standard Operating Procedures (SOPs): Documented steps that standardize operations from the mundane to the miraculous.
Medical Diagnosis Algorithms: Step-by-step symptom checklists leading to a diagnosis, with a side of differential diagnoses.
Sales Funnels: The predictable narrowing journey from awareness to conversion (and the occasional drop-off).
Negotiation Frameworks: Tactics and counter-tactics that form a pattern in deal-making, whether in boardrooms or bazaars.
Research Grant Proposals: Structured formats that distill creative ideas into a series of checkboxes and funding justifications.
Employee Onboarding Processes: A repeatable sequence of trainings, introductions, and the obligatory “welcome to the team” email.
Incident Reports: Standardized templates for documenting mishaps—because every crisis needs its forensic replay.
Market Analysis Reports: Predictable layouts dissecting trends, data, and what-ifs in a way that even economists appreciate.
Editorial Calendars: Planning patterns for content release that ensure no idea goes unpublicized (or forgotten).
Advertising Layouts: Design and copy frameworks that keep your brand visually and verbally on message.
Crisis Management Plans: Pre-structured responses designed to tame chaos with a cool head (if only life were so predictable).
Brand Guidelines: Consistent rules about logos, fonts, and tones that ensure every piece of communication feels “on brand.”
Business Proposals: Structured formats for pitching ideas that follow the same arc as a well-rehearsed sales pitch.
Customer Service Scripts: Dialogues and prompts that repeat across calls to ensure consistency—and sometimes to defuse frustration.
E-commerce Checkout Flows: Stepwise paths from cart to purchase, designed to minimize abandonment and maximize impulse buys.
Medical Treatment Protocols: Prescribed care sequences that standardize patient treatment (with built-in checks for exceptions).
Patent Applications: Detailed, repetitive disclosure formats designed to protect ideas down to the last comma.
Tax Filing Procedures: Yearly routines filled with forms, deadlines, and a dash of confusion.
Performance Reviews: Regular feedback cycles that, despite their predictability, never fail to spark debate.
Expense Reports: Standardized documentation for every coffee and conference fee, ensuring every penny is accounted for.
Meeting Agendas: Structured lists that transform chaotic discussions into orderly, if sometimes tedious, proceedings.
Supply Chain Workflows: Recurring steps in managing inventory and distribution, where every link in the chain matters.
Workflow Automation Sequences: Trigger-action patterns that let machines do the repetitive heavy lifting.
Data Analytics Dashboards: Visual patterns that transform raw numbers into insights (or at least pretty graphs).
Job Descriptions: Consistent outlines of roles and responsibilities that help everyone know what to expect—more or less.
Business Plans: Repeating sections (market analysis, strategy, finances) that provide a blueprint for entrepreneurial hope.
User Interface (UI) Patterns: Design components like navigation menus and buttons that users learn to love (or hate) over time.
Email Marketing Templates: Preformatted messages designed to catch attention in an inbox crowded with novelty.
Social Media Content Strategies: Recurring post types and engagement tactics that try to hack virality.
Architectural Blueprints: Standardized symbols and layouts that turn a visionary design into a buildable plan.
Public Policy Documents: Structured proposals and reports that shape the discourse of governance.
Customer Feedback Surveys: Consistent question formats designed to capture insights from every demographic.
Clinical Trial Protocols: Standard phases and data collection methods that ensure scientific rigor.
Inventory Management Systems: Patterns in tracking stock levels, orders, and deliveries—a necessary rhythm in retail.
Procurement Processes: Repeated steps from tendering to purchasing, designed to keep supply chains moving.
Investment Portfolio Structures: Recurring asset allocation strategies that balance risk and reward.
Energy Consumption Reports: Regularly formatted analyses of usage patterns, sometimes as riveting as a fiscal quarter.
Sports Strategy Playbooks: Repeating tactics and formations that coaches tweak game after game.
Event Planning Checklists: Standardized steps from concept to execution, ensuring nothing falls through the cracks (not even the confetti).
If you’re interesting in learning more about Amelia Peláez, we’d encourage you to check out the Amelia Peláez Foundation. You can also determine which image, Midjourney’s or Imagen 3’s, is closer to the artist’s style.
