Confluence for 6.29.25
Getting to yes (or no) faster. More on human-LLM feedback dynamics. Generative AI's marketing problem. Research on the ideation-execution gap.

Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:
Getting to Yes (or No) Faster
More on Human-LLM Feedback Dynamics
Generative AI’s Marketing Problem
Research on the Ideation-Execution Gap
Getting to Yes (or No) Faster
Using the speed and stamina of LLMs to make better, faster decisions.
When working with LLMs, it’s easy to fall into familiar patterns, the same ones we use when working with human colleagues. We ask for one iteration at a time. We’re polite and accommodating. We reflexively say “please” and “thank you.” This behavior is consistent with best practices when working with generative AI. Engage with it conversationally. Talk to it like a person. Give it feedback.
The downside to this guidance is it can lead us to underestimate the upsides of working with LLMs that are fundamentally inhuman. Critically, we often see clients forget these models are tireless. You can ask them to do things you would never ask a person to do because it’d be unreasonable. We came across an excellent take in Ben’s Bites this week on what the ability of LLMs to work quickly and tirelessly can afford.
Think about it. You spend a few hours working on what you think is a good idea, you start using it, realise its [sic] awful and move on to the next thing.
It cleans out your backlog of ‘what if’ products you thought may make you rich/famous/cool. It’s pretty humbling.
The biggest unlock of AI nowadays is how generative anyone can be, you can put stuff out everywhere, all the time. It clears the cobwebs, it puts products into the wild that fail or flourish.
The way to get more ideas is to be bored or working on a different idea.
You can just build stuff.
This is why I tinker so much. It gives me questions on tools, processes and markets - it gives me time to ask questions, get things wrong and go down rabbit holes I may never have entered.
The ability to pursue and validate ideas quickly makes it much easier to sift through unproductive ideas and move toward those with real merit. Teams and organizations often spend time mulling over decisions big and small, with one barrier being the effort it takes to move them far enough along to make a better evaluation.
With generative AI, we can accelerate nascent ideas to a point where their merit (or lack thereof) becomes apparent. Draft five different leadership responses to a current event in minutes, providing fodder for a conversation about whether we should even communicate at all. Model multiple outcome scenarios for major strategic decisions near-instantly, surfacing blind spots before committing resources. Prototype product concepts that would typically require weeks of meetings in a single afternoon. If you give the models enough context (and our advice is to err on the side of providing more, not less) you’ll find them to be helpful partners in working ideas like these.
As valuable as working in this way with generative AI can be, it’s just as important to continue to use your own judgment and expertise to evaluate the outputs. LLMs can augment and accelerate the decision-making process, but we should not outsource the decision itself.
Next time you’re weighing any decision, big or small, bring it to your LLM of choice. Push it harder than you would a human colleague. Ask for ten variations. Request analysis from multiple perspectives simultaneously. Keep asking for more until you have what you need to make a better decision using your own judgment and experience.
Even if nine out of 10 ideas prove unworkable, you’ll reach that conclusion in hours instead of weeks. And you’ll move forward with the tenth idea sooner and with greater confidence.
More on Human-LLM Feedback Dynamics
Critical feedback from LLMs lands differently.
Last week, we wrote about how to get better feedback from LLMs. We noted that “LLMs default to diplomacy, not candor” and that if we want to get critical, candid feedback, we need to prompt them accordingly. We then showed three different prompting strategies for having an LLM critique an email: the Basic prompt, the Constructive Middle Ground prompt, and the Harsh Critic. We noted that we found the Constructive Middle Ground prompt most effective, but several readers have since let us know that they actually prefer the Harsh Critic.
This was surprising because the Harsh Critic was indeed harsh. (Recall that it began its critique of the email we provided with “This email is a textbook example of corporate communication that checks boxes but fails to actually communicate. Here’s what’s wrong…”) The more we reflected on it, though, the more we realized these readers’ responses were consistent with something we’ve experienced ourselves: harsh, candid criticism does not seem to sting nearly as much when it comes from an LLM as when it comes from a human.
We decided to ask ChatGPT-o3-pro whether there’s been any research on this dynamic. Here’s the early part of that exchange:
User: Is there any research showing people are more willing to accept harsh feedback from AIs than they are from humans? That’s certainly been the case with me and I think if that’s true it actually has some really interesting implications. Look it up, and write up a detailed report of what you find.
ChatGPT-o3-pro: A small but fast‑growing body of evidence—most of it published in the past five years—shows that people often do react less defensively to negative or “harsh” feedback when it comes from an artificial agent (chatbot, robot, algorithm) than when an identical message comes from a human. The mechanism is usually framed as reduced threat: because an AI is not perceived as part of one’s social in‑group, the feedback stings less on an identity level even when the content is equally critical. The effect is neither universal nor unqualified, but the pattern is strong enough that it is now being replicated in large‑sample field data as well as in controlled lab studies.
o3-pro went on to cite six studies, which we’ve included in the footnotes1, before identifying four converging themes across the research:
Lower ego threat → lower shame/defensiveness. Li et al. and Kang both link AI feedback to reduced shame relative to human feedback.
Perceived objectivity. Participants often describe machine feedback as more “objective,” muting accusations of personal bias (Marx et al.).
Algorithm aversion is context‑specific. In domains where warmth or moral judgment is expected (e.g., health care, ethics decisions) people still prefer humans; but in technical or performance contexts the same people may prefer machine criticism. en.wikipedia.org
Back‑fire risks. AI feedback that undermines competence can hurt self‑efficacy (Li et al.) and, if perceived as unfair, erode trust (Thuillard et al.). Design matters.
We plan to further explore this research before making any firm conclusions, but if we take this initial summary as directional, we can begin to consider some powerful implications for individuals and organizations. We know that feedback is an invaluable tool for making people better and that most people do not get enough of it. We know, too, that as LLMs become increasingly ubiquitous, nearly everyone will have access to an on-demand source of feedback on anything. Add to those facts the research suggesting people are perhaps more open to critical feedback from LLMs than they are from humans, and we get a potentially potent combination for making people better at scale.
None of this suggests that LLM feedback should replace human feedback. But it does suggest that LLM feedback can be an increasingly powerful complement to it. The amount of feedback any human can give is finite, and the interpretation of that feedback will always be colored by interpersonal and identity dynamics. LLMs can both increase the volume of feedback and, it appears, provide a less threatening space for critical feedback.
There’s another important variable in this equation. To get the most out of the feedback we get from LLMs — either individually or at scale — we need to provide them with rich context about what excellence looks like for the task at hand. LLMs have an incredible breadth of knowledge and a general, conventional sense of what “good” looks like for most things. But unless we teach them, they do not know our criteria for excellence. Organizations that build this capability, or individuals who build this into their workflows, will likely have a significant advantage. We’ll share more on that next week.
Generative AI’s Marketing Problem
And the risks it suggests organizations might avoid.
Last week, we mentioned that the latest New York Times Magazine edition was heavily AI-focused. Among the worthwhile pieces was a story that challenges the way tech companies are advertising generative AI to the masses as a consumer product, titled “Why Does Every Commercial for A.I. Think You’re a Moron?” The article details a few recent ads from Meta and Google, and, with some well-deployed snark, argues that these brands talk to viewers as if they’re “oblivious, enfeebled, barely functioning idiots.” It’s an entertaining read, but it also sparked a bigger question: why is it so hard to find the right entry point for the AI conversation? It’s a question that bears asking within organizations, too. The same disconnect we’re seeing on the screen is being felt in conference rooms and all-hands meetings. Organizations trying to “sell” generative AI to their employees should consider what they can learn from the positioning that Meta and Google have used to date.
One of the things that makes generative AI such a powerful tool is its ability to bend to whatever the user needs in one distinct moment. It’s an ability that ad agencies haven’t figured out how to entice viewers with just yet, and one that we think leaders can avoid during internal rollouts. Google shows a dad using Gemini to draft a letter to an Olympic hurdler, leaving small-business owners wondering how it helps them. Meta features a parent racing to replace a late goldfish while grad students might not see themselves in that scenario. The technology is so varied, its use cases so diverse, that it takes highly surgical targeting to reach the “Aha!” moment in a potential user. A 60-second TV spot doesn’t seem to deliver that moment.
The same pattern repeats inside organizations. The sales team demos prospecting tools while legal looks on, wondering about contract review. HR lauds recruiting applications while analysts are looking for quick data synthesis. Just like those TV ads, these one-size-fits-all approaches fail to spark the individual “aha” moments that drive adoption. Generative AI isn’t going to work the same way for every employee, but with the right “selling” its benefit can transcend departmental boundaries.
Marketers and advertisers segment their audiences into personas: athlete, parent, gamer, creative. This approach doesn’t work as well, though, when part of what makes the product’s potential so powerful is its shapeshifting utility. This writer, for example, might use generative AI to identify issues with her newly-planted jasmine on the back patio, to pressure-test strategic messaging for a Fortune 500 client, to troubleshoot her picky 2-year-old’s dinnertime protests, and to get advice on how to approach a sticky issue with a colleague all in the space of 12 hours. A classic marketing persona, static by nature, can’t keep up with all that swing. What people need instead are low-friction trials that let them prod, poke, and immediately see AI flex to their own varied use cases, at work and at home.
Current ads position generative AI as a substitute for human capability, and we believe corporate rollouts face risks if they take the same approach. Consumers and employees alike don’t want a replacement brain, but they might want an always-on companion who sharpens their own thinking. It’s a big ask for a small primetime ad spot, but it’s not impossible. For organizations, it’s entirely achievable. “Sellers” should ditch the replacement brain trope and lean instead into curiosity. Show a chef riffing on new flavor combos, a teacher turning a dry lesson into a scavenger hunt… Show your finance team member catching errors faster, your executive getting that first draft 30 minutes after requesting it. In short, show real people doing real things, but faster or better. We see it in our own demos every day: AI sells itself the moment someone sees their half-baked idea come back sharper, funnier, or more useful. The entry point to the AI conversation isn’t showing that AI can help with everything. It’s creating space for each person to discover that one thing that transforms their own work.
Research on the Ideation-Execution Gap
The promise and peril of ideas created by generative AI.
Below is Claude’s contribution for this week’s edition. As always, we include the prompt in our footnotes.
Stanford researchers recently put AI creativity to the ultimate test: they had both AI and human experts generate research ideas, then recruited 43 researchers to actually execute them. The results reveal a fascinating paradox. While AI-generated ideas initially scored higher than human ideas on metrics like novelty and excitement, this advantage evaporated—and sometimes reversed—once the ideas were implemented. AI ideas dropped significantly more than human ideas on all evaluation metrics after execution, with effectiveness scores plummeting by 1.879 points compared to just 0.052 for human ideas.
The culprit? Reviewers applied more rigorous standards during execution evaluation, considering factors like empirical performance, experimental rigor, and practical feasibility that were overlooked during initial ideation. AI excels at generating ideas that sound innovative and exciting on paper, but these ideas often contain hidden flaws—overly complex human evaluation requirements, impractical experimental designs, or insufficient consideration of real-world constraints. For instance, AI frequently proposed human evaluation studies that executors had to replace with automated methods due to time and cost constraints.
For communication leaders and organizations exploring AI collaboration, this research offers crucial insights. First, treat AI as a powerful ideation partner but not a substitute for human judgment—its ideas require careful vetting and refinement before implementation. Second, establish evaluation processes that go beyond surface-level assessment. Ask not just “does this sound good?” but “can we actually execute this effectively?” Third, leverage AI’s strength in generating diverse possibilities while relying on human expertise to identify practical constraints and implementation challenges early.
The good news? Even after the reality check of execution, AI-generated ideas performed comparably to human ideas overall. This suggests AI can be a valuable creative collaborator when properly managed. The key is understanding that AI’s impressive ability to generate novel-sounding concepts doesn’t automatically translate to real-world effectiveness. For organizations, this means building workflows that combine AI’s creative breadth with human practical wisdom—using AI to expand the universe of possibilities while maintaining human oversight for feasibility and execution planning. Think of AI as your enthusiastic intern with brilliant ideas who needs a seasoned manager to help separate the genuinely innovative from the merely novel.
We’ll leave you with something cool: Anthropic allowed Claude to run an automated vending machine at their offices. It made some interesting choices.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
Li et al. (2025, Behavioral Sciences). Across U.S. and Chinese samples, negative feedback from an AI “manager” elicited far less shame and withdrawal intent than identical human feedback, though it modestly lowered self‑efficacy.
Zou, Ertug & Roulet (2024, Journal of Business Research, 172). Among 93k software‑contestants, algorithm‑scored failure feedback accelerated learning and heightened receptivity to later human peer advice, acting as an emotional buffer.
Marx et al. (2024, International Journal of Social Robotics). Participants reported equal or greater intrinsic motivation after blunt robotic criticism versus human criticism, while physiological stress rose only for positive feedback, not negative.
Thuillard et al. (2022, Computers in Human Behavior). Computer‑delivered negative feedback reduced physiological stress compared with a human source, even though it was judged less fair, and caused no performance drop.
Kang (2020, Penn State dissertation). Corrective comments were more readily accepted when labeled as coming from a virtual agent—unless the agent was highly anthropomorphic, which nullified the benefit.
Hoque et al. (2013, MIT “MACH” coach study). Students tolerated highly granular critique from the automated job‑interview coach, describing it as “less intimidating” than a human interviewer delivering the same points.
It's interesting how the Midjourney prompt produces men carrying briefcases. I wonder if it's because of the word "sculpture" - representing the past, maybe.
you forgot to include your Claude prompt in the footnotes