Confluence for 11.3.24
The AI adoption gender gap. Dealing with limitations in AI image generation. Bain and OpenAI expand their partnership. Generative AI seems to mirror some human tendencies.
Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI and corporate communication:
The AI Adoption Gender Gap
Dealing With Limitations in AI Image Generation
Bain and OpenAI Expand Their Partnership
Generative AI Seems to Mirror Some Human Tendencies
The Generative AI Adoption Gender Gap
A new study finds that women are less likely to use generative AI than men (though nobody seems to know why).
We know a few things from the emerging research on generative AI use. Many people now say they have used generative AI tools, although most haven’t used leading frontier models or used them extensively at home or work. We know that younger people seem to use generative AI less than older people, and they aren’t necessarily better at using it when they do. And now thanks to this new paper by Otis, Cranney, Delecourt, and Koning, Global Evidence on Gender Gaps and Generative AI (Berkeley Haas, Stanford University, Harvard Business School), we know that women may be less likely to use generative AI tools than men, despite similar access levels. This is not a questionable study: heavy-hitter institutions sponsored the research, and the methodology spans 16 studies that surveyed 100,000 individuals across 26 countries.
While the paper goes into detail describing the gap, there’s only speculation as to the cause. Regardless, the paper got our attention for reasons the authors describe well:
The findings … document that gender gaps in AI are nearly universal. From mothers in Mumbai to managers in Madrid, women use AI less than men when analyzing data from 16 studies covering over 100,000 people along with novel data measuring who visits the top Generative AI websites. Moreover, equalizing access does not appear to fully close the gap, even when presented with the chance to use Generative AI women are less likely to use this new technology than men.
This disparity has the potential to be significant. As generative AI systems are still in their formative stages, the under-representation of women in their early use and testing risks shaping tools that fail to meet the needs of half the population (Koning, Samila, and Ferguson, 2021; Cao, Koning, and Nanda, 2023a). Biases in user data — similar to those that have previously led to racial disparities in AI performance — could result in AI systems that reinforce gendered stereotypes and overlook tasks more often performed by women (Koenecke et al., 2020; Guilbeault et al., 2024). Ensuring that AI tools are designed inclusively will be crucial for unlocking their full potential to enhance productivity and reduce inequality. Given recent estimates that AI has the potential to increase US economic output and worker productivity levels by nearly 20% over the next decade (Baily, Brynjolfsson, and Korinek, 2023), and that women make up just under 50% of the US workforce, a persistent 25% usage gap could result in hundreds-of-billions of dollars of lost productivity and output gains in the US alone.
If you lead people and are exploring or pursuing the use of generative AI, this paper warrants your attention. Without having an idea about causality, it’s hard to offer practical guidance on what the gap may mean for leaders or communication professionals and what they should do about it. But in the least, it should be on your radar, and it’s going to be on ours.
Dealing With Limitations in Image Generation
The limitations are real, but shouldn’t stop you from using the tools.
This week, Leena Nair, the CEO of Chanel, made headlines by pointing out the bias in DALL-E 3, ChatGPT’s built-in image generation model. On a visit to Microsoft with her senior leadership team, Nair used ChatGPT to generate “a picture of a senior leadership team from Chanel visiting Microsoft” only to receive an image that was “all men in suits,” and not reflective of the diversity of the team. While the output in this case was disappointing, it’s not necessarily surprising. The bias in image generation models like DALL-E is a known — and substantial — issue.
And it’s not the only issue. Examples abound of DALL-E, Midjourney, and other image generation models generating bizarre images: people with three arms, strange objects that don’t make sense in the physical world, and inscrutable or nonsensical text. The reason is simple: these models are trained on images and do not have a built-in “understanding” of the world. They don’t “know” how many arms a human should have, how gravity works, or how to spell. They are imitation machines that predict what an image should look like based on the user’s prompt and their vast training dataset of images. As for bias, if there’s bias in the training dataset — which there is — the images will reflect it.
We’ve heard of many people dismissing these tools outright until these issues are resolved. While we’re hopeful that the labs make progress on these issues — particularly around bias — we think dismissing them out of hand in the interim is a mistake. As with large language models, the first step is to be aware of these limitations and issues. The second step is to use that awareness to work around them. So, what does that look like for image generation models?
The primary principle we adhere to is to avoid using generative AI images for anything with a “right answer” — that is, anything where specific, factual details really matter. This usually means avoiding using generative AI to attempt to create realistic images of humans, specific locations, or anything else that can be verifiably proven wrong. For example, we used the prompt “Chicago skyline on November 1” to generate the image below using Midjourney.
There are likely thousands of images of the Chicago skyline in Midjourney’s training data, so this image gets some things approximately right. It looks and feels Chicago-ish, and it shows the iconic Willis Tower. But anyone familiar with Chicago could quickly point out a number of things that are wrong with this image. The Willis Tower is not that close to a park, for one, and architectural experts could likely find inaccuracies in the representation of the tower itself. Midjourney can create an image that resembles Chicago, but it cannot create a realistic image of Chicago that is factually accurate. In any scenario with reputational stakes — like using an image in a business presentation or publication — it’s simply not a good idea to use these models to create images where details like these matter.
The good news is that there a lot of ways to create compelling images that don’t depend on such a high level of precision in their output. For our Confluence cover images, we typically aim for creative, conceptual images that depict certain ideas, moods, or aesthetics — not factual depictions of real-world phenomena (human or otherwise). For example, there would be no “right answer” for an image that aims to depict an “urban American skyline on November 1,” which we used to generate the image below.
Again, the first step to dealing with these limitations is to be aware of them. When tasked with generating depictions of real-world people, objects, or phenomena, these models will get the details wrong. And when the images depict people, these models (unless prompted otherwise) will likely reflect some amount of bias. When you’re aware of these issues and work around them, there’s plenty of space left to create compelling imagery that you can confidently bring to your work.
Bain and OpenAI Expand Their Partnership
There’s building urgency to scale the benefits of generative AI.
Bain and OpenAI are expanding their partnership. What began as Bain reselling ChatGPT Enterprise licenses has evolved into a joint team focused on building industry-specific AI tools.
In many ways, this development feels inevitable. The current landscape of generative AI tools, while powerful, resembles a set of Swiss Army knives — versatile but not specialized. ChatGPT, Claude, and others excel at general tasks but weren’t designed with specific industry requirements in mind. And while versatility matters, it also puts a tremendous onus on the end user to figure out how to best use the tools.
We’ve already seen signs that organizations are getting impatient with experimentation with generative AI and are facing mounting pressure to demonstrate tangible returns on their investments. And it’s no surprise that consulting firms would jump in to try and bridge the gap between general-purpose AI capabilities and industry-specific needs. Bain and OpenAI’s partnership may be a headline this week, but they are far from alone in seeing this opportunity.
The development carries another message worth noting. The architects of these AI models, along with their strategic partners, plan to maintain their aggressive push into enterprise markets. Organizations sitting on the sidelines may soon find themselves with little choice but to engage. The question increasingly shifts from “if” to “how” and “when.”
We won’t speculate the extent to which this partnership will realize its ambitions, but we do expect that the path from concept to measurable ROI will rarely run straight. The challenge lies not just in building industry-specific tools, but in creating solutions that truly integrate with existing workflows and deliver meaningful improvements.
Regardless of the enterprise solutions born out of this partnership, our advice in the short term remains the same. Continue to explore these tools, mapping their capabilities, and understanding their limitations. This exploration doesn’t just satisfy curiosity — it drives efficiency, uncovers opportunities, and helps us navigate the “jagged frontier” of generative AI capabilities.
Generative AI Seems to Mirror Some Human Tendencies
New research suggests that AI can “overthink” things, too.
As we have written in the past, we believe the best way to interact with large language models is to think of them like people — to share context, engage in back-and-forth conversation, and recognize that they’re not infallible. New research from Princeton University shows that large language models may “act” like people even more than we realized — for better and for worse.
The study examines chain-of-thought prompting, which means getting AI models to break down their thinking step by step. Chain-of-thought is widely considered a best practice for interacting with LLMs in many use cases (and is foundational to OpenAI’s new o1 model), so we were surprised and intrigued to learn that the researchers in this study identified specific cases where chain-of-thought prompting dramatically reduced accuracy. What’s striking is that these cases map perfectly onto situations where explicit verbal reasoning makes people perform worse as well.
One example cited in the study is pattern recognition. When humans try to verbalize how they recognize complex patterns, they often perform worse than when they rely on intuition. The researchers found that AI models show the same weakness. Their accuracy dropped by up to 36% when asked to explain their pattern recognition process step by step. The researchers observed the same phenomenon with facial recognition. Just as humans become worse at recognizing faces when forced to verbally describe them, generative AI models showed significant accuracy drops (up to 32%) when prompted to explain their facial recognition process.
Chain-of-thought prompting remains a powerful way of interacting with LLMs, but this research demonstrates that it’s not a universal solution. Sometimes, we should look to human cognition as a guide. Does the task involve the kind of thinking that humans do better when they “show their work”? If so, chain-of-thought will likely help. Does it involve processes that people do better intuitively? Then direct prompting might be more effective.
We’ll leave you with something cool: ChatGPT Search is now available to paid users. We’ve found it impressive in our use thus far.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
H1: Men value getting ahead more. Women value getting along more. Most Gen-AI use cases are targeted to Getting Ahead at the moment.