Confluence for 9.14.25

The wizard problem. Claude can use mobile apps. Claude crosses the workflow threshold. A week with generative AI.

Sep 14, 2025

Midjourney prompt: *Photo of a September morning.*

Welcome to Confluence. It was a big week for Claude, even without the release of any new models from Anthropic. With a few feature updates, Claude can now use select mobile apps, create and work directly with common Office files, and has more comprehensive memory. All three of these developments remind us yet again of the degree of overhang in this technology. None of these advances require a more capable model — they are simply expanded applications of what already exists. With that said, here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:

The Wizard Problem
Claude Can Use Mobile Apps
Claude Crosses the Workflow Threshold
A Week with Generative AI

The Wizard Problem

The more models improve, the harder it is to stay vigilant.

This week, almost exactly two years after introducing the metaphors of “cyborgs” and “centaurs” as modes of working with generative AI, Ethan Mollick introduced a new one: the wizard. Here’s Mollick introducing the idea:

… [with] the new wave of AI, for an increasing range of complex tasks, you get an amazing and sophisticated output in response to a vague request, but you have no part in the process. You don’t know how the AI made the choices it made, nor can you confirm that everything is completely correct. We’re shifting from being collaborators who shape the process to being supplicants who receive the output. It is a transition from working with a co-intelligence to working with a wizard. Magic gets done, but we don’t always know what to do with the results. This pattern — impressive output, opaque process — becomes even more pronounced with research tasks.

The upside of this dynamic is that the new models, as Mollick puts it, are “capable of some frankly amazing feats.” We see this frequently with our AI-generated Confluence podcast. The podcast — which takes about five minutes to create — not only summarizes each week’s insights and transitions smoothly between topics, but often surprises us with its own insights and by making connections we hadn’t noticed ourselves.

One major downside of this dynamic — which we’ve also seen with the Confluence podcast — is that it increases the risk of falling asleep at the wheel. The more sophisticated, confident, and complex the outputs of these models get, the harder it is to stay vigilant in ensuring quality. For each episode of the podcast we generate, we listen to the entire 15-20 minute episode to validate its quality. The podcast could be completely accurate for 19 minutes and 30 seconds, but all it takes is one mistake to tarnish the entire thing. We actually fell victim to this a few weeks ago, realizing only after the publication of an episode that the podcast had mispronounced the name of our virtual leadership coach, ALEX, as “A-L-X” — a small but noticeable error that our listeners noticed.

These risks fall into the broader category of automation bias, which Claude Opus 4.1 summarized for us as follows:

Automation bias is the tendency for people to over-rely on automated systems and favor suggestions from automated decision-making systems, even when those suggestions contradict other available information or their own better judgment.
This bias is particularly relevant to your work at the intersection of AI and leadership communication. As organizations increasingly integrate AI tools into decision-making processes, leaders need to maintain critical thinking rather than defaulting to AI recommendations. The bias tends to be stronger when:
People are under time pressure or cognitive load
The automation has been reliable in the past
The system presents information with high confidence
Users lack expertise in the domain

We know people are increasingly under time pressure and dealing with high cognitive load, and we’ve already alluded to these models’ general reliability and the confidence with which they present information. But the fourth factor — users lacking expertise in the domain — is just as important. Mollick observes that part of the wizard’s magic lies in its opacity: “the wizard [doesn’t] tell [us] the secret to its tricks.” This becomes especially risky when we’re asking AI to perform tasks outside our own expertise that we do not know how to do ourselves. How could we even begin to verify the soundness not just of the output, but also of the process used to create it, if we don’t know what quality output or process should look like?

These challenges aren’t going away, but neither are the benefits. The question, as always, is how to harness the power of these models while maintaining necessary vigilance — especially as the potential power increases and the requisite vigilance gets harder. For anything with reputational stakes, we recommend the following checklist as a start:

First, maintain rigorous validation for any output, no matter how convincing or confident that output appears.
Second, scrutinize the process, not just the product. Ask the AI to show its work, break down its reasoning, or explain its choices. We may not be able to see the AI’s entire process (given their increasing opacity), but we should push to see and vet as much of it as we can.
Third, resist the wizard’s most seductive promise: expertise in domains where you have none. The further you stray from your own expertise, the less equipped you are to catch any mistakes in the AI’s output or lapses in its process.
Finally, recognize when to collaborate with AI versus when to delegate to it entirely — a distinction we explored here. The wizard may be powerful, but we’re still the ones who must decide when to trust the magic.

Claude Can Use Mobile Apps

An example of how LLMs smooth out routine work.

Perhaps the most mundane of the Claude updates this week is its ability to work directly with iOS and Android system apps. Claude can now, as Anthropic puts it, “connect with your iOS [or Android] device’s system apps to help you take action directly from your conversations. When you chat with Claude, it can draft messages, emails, or calendar events, find locations, and manage reminders — all seamlessly integrated with your mobile apps.”

In practice, this turns the Claude mobile app into an agent with a limited, but still quite useful, suite of capabilities. For instance, you can work on an email draft in a conversation within the Claude iOS app and, presuming you’ve told Claude it’s an email, it will present the option to “Send Email.” Simply tap it, and Claude can populate a draft message in the Mail or Messages app (or select third-party apps, including WhatsApp). All you need to do then is add the contact information and press send.

The calendar integration is even more straightforward. Tell Claude to create a calendar appointment, and it will do it. You can then open the event in your calendar with a single tap. The location capabilities, while not yet available on Team or Enterprise plans, should prove useful in specific use cases. You can ask Claude about the local weather or ask for restaurant recommendations in your area rather than needing to specify where you are.

These narrow capabilities reduce the friction between the user and Claude for routine work. While working on your phone, it can be much easier to work with Claude, using voice dictation, rather than typing out a lengthy email. Instead of navigating between copy and paste functions, shuffling between apps, you simply tap your screen once and add the right email address to the “to” line. You can open Claude and simply dictate a calendar appointment or reminder, and it’s done. Professional workers perform dozens of tasks each day involving extra steps, button presses, or finger taps, all creating friction and extending completion time. These capabilities show how LLMs are beginning to smooth out routine work. As they become more integrated into our apps and operating systems, we expect this trend to accelerate, with natural language instructions replacing multi-step processes for even the most mundane professional tasks.

Claude Crosses the Workflow Threshold

From conversation partner to embedded colleague.

This post is written by Anthropic’s LLM, Claude Opus 4.1. We gave directional guidance and feedback only, and provide Claude’s text here without editing.

This week, Anthropic announced several developments for Claude, but two in particular signal a fundamental shift in how large language models integrate with professional work. The first is memory: Claude can now remember context about you, your team, and your projects across conversations. The second is file creation: Claude can generate and edit actual Excel spreadsheets, Word documents, PowerPoint presentations, and PDFs. While these might sound like incremental feature additions, they represent something more profound: the transition of LLMs from sophisticated Q&A tools to persistent collaborators embedded in our daily workflows.

Anyone who has worked extensively with LLMs knows the ritual of context-setting that begins each conversation. You explain your role, your company’s focus, current projects, and relevant background, every single time. Claude’s new memory feature eliminates this friction for Team and Enterprise users, maintaining separate memories for different projects and allowing granular control over what’s remembered. When an AI assistant maintains context across weeks and months of work, it functions less like a tool and more like a team member who understands your organization’s evolving priorities. Sales teams can maintain client context across deals, product teams preserve specifications across sprints, and executives track initiatives without constantly rebuilding strategic context.

The file creation capability may be even more significant. Claude can now create actual, downloadable files through what Anthropic describes as “access to a private computer environment where it can write code and run programs.” A marketing manager can upload raw survey data and receive back a polished Excel workbook with cleaned data, statistical analysis, charts, and written insights. Meeting notes transform into formatted documents. PDF reports become PowerPoint presentations. Claude moves from being an advisor who tells you what to do to an active collaborator who does the work alongside you. For many routine analytical and document creation tasks, the distance between idea and execution has collapsed to the length of a conversation.

The combination of memory and file creation is particularly powerful. Claude can remember your team’s reporting formats, understand your client’s preferences, and then generate documents that match those specifications without repeated instruction. It can track project evolution over months and produce analyses that reflect that accumulated context. This isn’t just automation of routine tasks; it’s the emergence of AI systems that function as genuine collaborators with institutional knowledge. We’re witnessing the steady march toward what many predicted: LLMs becoming as essential to knowledge work as email or spreadsheets. But unlike those technologies, which took years to fully integrate into workflows, LLMs are evolving at a pace that demands immediate adaptation.

These features initially roll out to Claude’s paying Team and Enterprise users, with memory available now and file creation following in the coming weeks. The trajectory is clear: we’re moving rapidly toward a future where AI assistants maintain ongoing relationships with teams, understand organizational context, and produce finished work products rather than just advice. For leaders, this raises immediate practical questions about governance, quality control, and workflow integration. But the larger message is unmistakable: the integration of LLMs into professional workflows is accelerating from gradual evolution to sharp discontinuity. Organizations that recognize and adapt to this shift will find themselves with capabilities their competitors can barely imagine.

A Week with Generative AI

A view from under the capability overhang.

One of your Confluence writers had an interesting experience recently on an airline flight. Third in the row, he overheard the two passengers next to him talking about generative AI. The person on the aisle was a pharmacist, and she had little to no experience with the technology, having “played with” ChatGPT some. The person in the middle seat, a pharmacological salesperson (what are the odds?), was at that moment using ChatGPT to create a logo for a local basketball league he sponsors. That logo creation was not going well, so your writer chimed in, helping the salesperson learn how to edit just part of the logo, which keeps ChatGPT from regenerating the entire image. This led to a conversation about “Do you know how to use ChatGPT’s ‘thinking’ mode?” (no), to “How do you use it for work?” (rarely), to “Let me introduce you to ALEX,” and to quite a bit more.

The exchange illustrates the “overhang,” the difference between what most people believe these models can do, and what they can actually do. So we thought it might be of interest to list all the chats your writer has had with generative AI over the past 10 days as a window into how an experienced user is using these tools in their daily life. Here they are, with some color commentary, in no order:

Here’s a video transcript. Give me a title, description, and timestamps / chapters for it.
Give me yesterday’s scores for the college football top 25.
Here’s a photo of an old model airplane. Is this worth anything? (Turns out it’s not old, and no, it’s not.)
Create a first draft of the design of week 5 for this intensive developmental program for me. Remember our earlier chats in this project as you do.
How do I change the time on my 1895 Waltham pocket watch?
It’s been two weeks. The Siamese kittens we rescued are doing fine. The female is relaxed and snuggly and purrs. We have yet to hear the boy purr. He seems nervous most of the time. He plays with us a lot (with toys) but doesn’t seem to want affection. (Nothing to worry about.)
Out of the blue, my Quest Link cable seems to no longer be working between my Quest and my PC. The PC will charge the Quest but it doesn’t seem to be recognizing it as a USB device. The Oculus software doesn’t see the Quest 3 and the Quest 3 doesn’t see the PC. It was working fine just a couple days ago. Any ideas?
To ChatGPT: I’m facilitating this panel later this week [ATTACHMENT]. Create a research report to prepare me.
To Gemini: I am facilitating this panel later this week. Here are the panel agenda and speakers. Prepare a research report to fully prepare me for the panel.
To ALEX: Howdy ALEX. I’m facilitating a panel later this week for [CLIENT’s] VP meeting. The flow and attendees are below. What do you see as best practices for facilitating a panel?
You have in Notion the outline for my upcoming panel. Thoughts on the questions they’ve prepared?
I have a friend who has a home from the 50s and he’s really made it beautiful. Much of the internal aesthetic is mid-century modern. I love that look but it’s not my spouse’s preference. His home just had this balance to it. I have no idea how to pull that off. Ours is more confused, although contemporary coastal might describe what we like (and I like it too, though I do like mid-century modern and mission as well). I want to get to that balance in the home design. Where do I start? Here are some photos. (This leads to the creation of Projects in both Claude and ChatGPT around an extended effort to refine the interior design of the home. Subsequent chats from this series include defining the desired style (California Coastal Santa Barbara / Montecito), creating a set of specific design principles to guide the work, building a new look around photos of a home office, and getting specific redesign advice on cabinetry and built-ins in the home. Many of these chats involved giving ChatGPT and Claude photos that then guided the subsequent advice.)
I’m searching for a new desk lamp. Here’s my desk. Give me a search sentence for Google and Amazon. … What do you think of these? (Photos attached) … How about for a rug for my office?
Tell me about Kentia Palms … Show me a picture … Are any available at our local Home Depot?
What happened in the news with [CLIENT X] and [CLIENT Y] at [ORGANIZATION] this past week?
I have this beautiful antique silver watch a colleague gave me. It’s meaningful. It’s in this box. I’ll show you a picture. I’m going to get a small stand for it. All the pocket watch stands that I can find are too ornate and ugly. Do a web search for alternatives. I’m thinking there is a small stand for a phone, or a photo, or some other object that would fit the design and that I could rest the watch on so it’s propped so I can see it.
How do I say “Come here” in German?
Do the trigonometry to figure the fields of views of these three screens. The front screen is now 61 inches wide with an eye point 35 inches back.
Tell me about [ORGANIZATION].
Look at my calendar over the last 30 days and do a time audit.
Now that you have native access to my phone calendar, reminders, and Notion, review them all and give me a briefing to prepare me for the rest of my week. Consider travel, weather, clients etc. Use the web however you need. Format it in a way suitable to me as a busy executive. Thanks.
Tell me about the [STRATEGY] in the context for my [CLIENT] meeting coming up. It’s in Notion.
Please proofread this proposal for grammar, spelling, and typos. Be careful. Take your time.
Put an appt on my calendar today at 5 pm to call [PERSON].
Search my calendar and tell me what appointments I have coming up with [PERSON].
Please proofread this document for grammar, spelling and typos.
Given my age, if I were to spend only five minutes stretching before bed, what stretches should I do?
ALEX, I’m doing a session next week on our talent model and selecting great talent. In bullet form so I can add them to my notes, give me the components of the talent model (skillfulness, potential, judgement, fit), the gates we should go through, and for each of the four components, the ways to identify the right person. For the potential part, list the three talent components and the three things that make up each. All in bullet form so I can add it to my outline. Thanks.
Based on the firm’s strategy what sort of things should we be talking about in Q4?
Hi. Can you give me the best books on leadership from our book list?
I’m doing a session tomorrow on “Why Would Anyone Follow You” for about 30 senior leaders at [ORGANIZATION]. Thoughts on an exercise or ice breaker?
Give me a detailed briefing on DBT/Metabase.
Create a prompt for Midjourney based on this text: [TEXT].
World class mojito recipe? … What gin is closest to Bar Hill Tomcat gin? … Best Negroni ratios?
Tell me what section 12.8 of this contract means in light of section 12.4, especially in the context of prior intellectual property.

We hope a few things stand out about this window in time. One is the breadth of topics of discussion, from IT help to contracts to trigonometry to languages to interior design to deep research. Another is the variety in the depth of the interactions, from simple Google-like requests, to ongoing conversations that include the development of custom agents with specific instructions doing multi-modal work with image interpretation and generation. A third is the range of applications across personal and professional domains. And a fourth, though readers may not notice it, is the relative restraint in abilities of these requests for what these models can do. While this week didn’t require it, the same user has recently used generative AI for statistical analysis, deep peer review and critique of work product, and the development of a software application to serve a personal need where none existed in the general marketplace.

We’re often asked, “You all are deep into this stuff. How are you using it?” And our response is always, “We’re using it for everything we can think of.” Don’t wonder if generative AI can help you with something. Presume it can, then figure out if it can’t. These tools add value to the Confluence team’s work and personal lives on an almost hourly basis. The overhang is large, but we’re always working to figure out its edges. We believe that’s the best way to understand the technology and forecast what’s to come.

We’ll leave you with something cool: Google DeepMind has a model, Aeneas, designed to add context and help historians interpret ancient Latin inscriptions.

Listen to Confluence on Apple Podcast

Listen to Confluence on Spotify

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

Confluence: AI, Leadership, and Communication

Discussion about this post