Confluence for 9.29.2024

Understanding agents (and NotebookLM in action). The Impact of Framing on Generative AI Adoption. 185 Use Cases for Generative AI. Generative AI as a “Language Microwave."

Sep 29, 2024

Midjourney prompt: *A group of people working on mundane tasks, style of pop-art manga*

Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI and corporate communication:

Understanding Agents (and NotebookLM in Action)
The Impact of Framing on Generative AI Adoption
185 Use Cases for Generative AI
Generative AI as a “Language Microwave”

Understanding Agents (and NotebookLM in Action)

It’s time to start paying attention to what may be the next frontier of generative AI applications.

People in the field of artificial intelligence are increasingly talking about “agents” — AI that has agency: goals, the ability to plan, and the ability to make decisions and take actions in service of those goals and plans. Agents represent much of the promise of generative AI, a pool of intelligent actors that understand our needs and preferences, acting on our behalf. Imagine an AI that you can chat with like ChatGPT, that has all your frequent traveler information, knows your travel preferences (aisle seats only, please), and can book travel on your behalf. Or the generative AI agents of five busy executives, coordinating together via Microsoft Outlook and Google Calendar, to book a tough-to-schedule meeting (in seconds). Or an agent that appreciates your culinary tastes and knows your schedule and that of your partner, that can find and book a dinner reservation.

These are all examples of relatively mundane utility, but they have powerful appeal. But think what very intelligent agents with data could also do: vet job applicants not only on their applications, but on conversations with a generative AI designed to present case studies and test for cultural fit. Or search for and surface investment opportunities in the market. Or develop computer applications, even going so far as to create advertisements to hire coders for the job (which has already happened). Agents: large language models with great linguistic capacity and an ability to plan and reason, paired with data and decision rules, are the next frontier of generative AI applications — and likely the most powerful (and possibly disruptive) we’ve seen.

While agent development is nascent, it’s moving quickly and will likely accelerate. Now is the time to start paying attention. One resource to help ground you is this recent podcast from Google DeepMind. While the focus is the use of agents in video games (imagine non-player characters in your favorite video game not following pre-written scripts, but instead acting as independent actors within a range of goals and objectives — each interaction on each playing of the game would be unique), it’s a solid grounding in the issues.

We believe the video is worth watching. But as it’s 45 minutes long, we also used the podcast as a reason to use the YouTube feature that is now part of Google’s NotebookLM, the Google generative AI tool that has captured our attention and the attention of many others (and that we wrote about last week). Add a YouTube link to your NotebookLM notebook and the notebook will quickly craft a summary for you, and give you the option to generate an FAQ, study guide, table of contents for the video, timeline, briefing document, or, in a very meta thing, a podcast of the podcast (see the screenshot below).

We had NotebookLM write a briefing document for the video, which is in the footnotes1. You can also chat with the contents of the video. We asked, “How do they define ‘agent’” — here’s the reply:

Defining an “Agent” in Artificial Intelligence
Frederic Besse, a Senior Staff Research Engineer at Google DeepMind, defines an agent as “a very general concept,” stating that “an agent is an entity that can act in an environment”. He emphasizes that “action is what defines an agent”.
Besse further explains the following:
An environment is a space that provides observations to the agent and allows the agent to interact with it.
It’s important to note that an agent’s actions within an environment will change that environment. This interaction is crucial for developing AI systems that can perform useful tasks and understand the consequences of their actions.
Agency is the ability of an agent to take actions, regardless of whether those actions are pre-programmed or self-determined.
Autonomy, on the other hand, is the ability of an agent to act independently to accomplish a specific task. There are different levels of autonomy. For instance, a self-driving car navigating roads exhibits a degree of autonomy, while a factory robot programmed for a specific, repetitive task has less.
Besse gives several examples of agents, including:
Humans: We observe the world through our senses, have proprioception (awareness of our body's position), and can affect the world around us through our actions.
Robots: These physical agents can interact with and manipulate the physical world.
Autopilot systems: Although older autopilot systems operate based on handcrafted logic and pre-programmed instructions, they are still considered agents because they can take actions within their environments.
Virtual Agents: These agents exist and operate in virtual spaces, like video game characters. Their actions might not have real-world physical consequences, but they can still make decisions and impact their virtual environments.
Code-generating Agents: These agents operate in abstract spaces, like programming languages, and can take actions such as writing code, compiling functions, and searching the internet.
Besse distinguishes these agents from chatbots powered by large language models:
While chatbots can process information and generate human-like text, their primary function is communication, often limited to language-based output.
Some chatbots can search the internet or perform specific programmer-defined functions, but their action space is more constrained than that of agents operating in dynamic environments.
Chatbots are typically trained on massive text datasets scraped from the internet, rather than being trained to learn through trial and error within an interactive environment like many agents are.
However, Besse also acknowledges the potential for combining large language models with agents. He points to Google DeepMind's RT-2 and RT-X projects as examples of this integration. In these projects, language models are used to provide robotic arms with a “conceptual understanding” by training them to generate joint activations based on textual instructions.

We’ve written at length over the past year about the possible implications of generative AI for daily work and people’s jobs. We’ve noted, based on the work of Dan Rock and others, that the way to think about this is in terms of the overlap generative AI has with specific tasks within a given role, as there are few instances (at least so far) in which a generative AI application could completely replace a given role. And in many cases, something like Claude or ChatGPT’s overlap with a role is actually augmentative — it’s making you better at the task — rather than a replacement for you doing the task.

Agents will be the next significant variable in this calculus. Well-crafted agents, able to complete tasks with many steps and that require decisions along the way, should have the potential for much greater overlap with specific job tasks than do chatbots. If history is a guide, this will in many cases free people to engage in higher-level work. But the nature of general-purpose technologies is that their implications are difficult to predict. This is why we are paying close attention to agents, and think you should be, too.

The Impact of Framing on Generative AI Adoption

When it comes to rolling out generative AI tools, framing matters.

When we lead development sessions or seminars on generative AI with leaders, we always end with a discussion of what we call the “human factors.” This touches on what generative AI is likely to mean for things like authenticity, credibility, judgment, and skill maintenance and development. As readers of Confluence will know, we firmly believe that considerations of these factors in the context of generative AI are just as important as considerations of the utility of the tools. We’re always looking for new research to help us get smarter on this, and last week, that came in the form of a webinar from Stanford’s Arvind Karunakaran, titled “AI in the Workplace: Rethinking Skill Development.” The entire hour is packed with insights and is worth watching in its entirety, but one insight in particular stood out to us for its direct relevance to corporate communications: the impact of how leaders frame the purpose of generative AI tools.

Karunakaran begins the session by noting a key factor that is unique about generative AI tools: given the general capabilities of these tools and their wide-ranging functionality, you cannot train people on using them the same way you can with other enterprise software (for example, Zoom or an ERP system). Rather, unlocking their value and finding the functionality that works best for a given organization, team, or role requires experimentation and trial-and-error. And that’s the rub: absent any incentives to do so, few people other than the most curious in any given organization will spend the time or have the patience to do that. The challenge — which many organizations are facing head-on right now — is how to encourage fruitful adoption (that is, adoption of generative AI that adds value to organizations and to individuals) given these conditions.

After spending a few minutes to establish this context and the challenge, Karunakaran spends the majority of the remaining time in the webinar reviewing a case study from a law firm. The firm in question took a decentralized approach to rolling out a generative AI tool (which Karunakaran refers to as LawBot). Each department and team was empowered to take their own approach, which led to ideal conditions for a comparative analysis. A key finding came from the comparison of two divisions of paralegals, which Karunakaran labels Division A and Division B. Here are the results of the comparison:

Source: Arvind Karunakaran, “AI in the Workplace: Rethinking Skill Development,” Stanford Online

These are individuals with the same role and in the same firm, but simply in different departments. So what accounts for the stark differences shown above? Karunakaran’s research found that one of the biggest differences across departments was how the use of LawBot was framed to employees. In Division A, the dominant frame was “skill development,” whereas in Division B, the dominant frame was “productivity improvement.” When employees viewed the tool as one that could improve their skills and free them up from the mundane to spend more time on complex, interesting tasks (as happened in Division A), they used it more, identified more use cases, and shared their knowledge with each other. When employees viewed the tool purely through the lens of productivity (as was the case in Division B), they were more likely to view AI as a threat and less open to use, experimentation, and knowledge sharing. The former led to a virtuous cycle of increasingly valuable use of generative AI, and the latter to a vicious cycle of dismissal. Ultimately, in the case of Division A, employees were able to perform new tasks and expand their jurisdiction because of their use of generative AI; employees in Division B were not.

With generative AI, as with anything else, framing matters. As more and organizations embark on efforts to implement generative AI solutions across the enterprise, communication leaders (and all leaders) would be wise to heed the lessons from Karunakaran’s case study. Framing was not the only factor that led to the fruitful outcomes for Division A, but it was among the most important. The ability for organizations, teams, and individuals to get the most out of generative AI will depend on their willingness (and patience) to experiment. And how willing employees are to do that will be driven in large part by how the tools are presented to them. Productivity gains are important, of course, but to emphasize productivity over skill development and opportunity may be a losing proposition, not just for individuals’ ability to get value out of these tools but for entire organizations.

185 Use Cases for Generative AI

Google updates its list of customer use cases, again portending agents.

This week, Google published a blog post intended to show the real benefits of deploying generative AI tools (in this case, specifically those designed by Google). The post presents 185 use cases organized into six categories. Notably, Google labels each category based on the type of agent it represents, which aligns with the point we discussed in the first section of today’s Confluence. The six categories are:

Customer agents
Employee agents
Creative agents
Data agents
Code agents
Security agents

The list itself is worth skimming through to understand how some organizations are beginning to use generative AI to solve real problems or make real work more efficient or effective inside their organizations2. While some feel quite general and vaguely-defined, there are plenty of examples that 1) speak to the specific application of capabilities squarely within the jagged frontier and 2) can serve as fodder for your own experimentation and application of generative AI.

This is where we see the value in lists like this one. Even as the best advice for understanding the capabilities of generative AI remains “bring AI to every table,” the second-best advice is to learn from those already using it daily. When we lead workshops or client sessions exploring AI applications, we always encourage participants to keep the conversation about generative AI alive. The more we share ideas, successes, and failures, the better we’re able to get real value out of these tools and identify current and future use cases.

Generative AI as a “Language Microwave”

Quicker isn’t always better.

In Stephen Marche’s recent article in The Atlantic, “AI Is a Language Microwave,” he argues that as AI excels at producing formulaic, technically sound content, the value of distinctly human writing will increase. Marche writes that just as you could technically make a grilled cheese sandwich in a microwave, the result pales in comparison to one crafted on a stovetop — crispy, evenly melted, and imbued with that irreplaceable human touch. Similarly, while AI can produce technically correct writing at lightning speed, it often lacks the depth, nuance, and emotional resonance of human-crafted prose.

Marche argues that this distinction will elevate the value of what he calls “grilled-cheese writing” — uniquely human, at times imperfect, but ultimately more satisfying and impactful. As AI handles routine communications, the ability to produce original, thought-provoking content may become a key differentiator. Marche suggests that future writing education might thus focus less on technical proficiency and more on fostering creativity, critical thinking, and emotional intelligence.

In one of the white papers our firm wrote and shared last year, we came to the same conclusion: “As organizations integrate powerful generative AI technology … employees will have a very adept writing assistant at their disposal. This is likely to increase the average quality of written communication across the organization while decreasing the time invested in the process. Perhaps paradoxically, this will also likely mean that truly exceptional, powerful written communication becomes even more valuable, as ‘pretty good’ written communication becomes a commoditized norm. For highly strategic communication, individual voice and personality should be more powerful than ever.” Marche’s microwave metaphor is a sticky one, though, and a timely reminder of the continued importance of the human hand in writing truly resonant communications.

We’ll leave you with something cool: Spotify has released an AI playlist builder, which allows you to create playlists using text prompts.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

Gaming, Goats & General Intelligence: A Deep Dive into DeepMind's SIMA

This briefing document reviews the key themes and insights from the Google DeepMind podcast episode “Gaming, Goats & General Intelligence with Frederic Besse.” The episode explores DeepMind’s SIMA project, focusing on developing general-purpose AI agents within video game environments.

Main Themes

Defining AI Agents: The episode distinguishes between various AI concepts:
- Agents: Entities capable of taking action within an environment, like humans, robots, and even autopilot systems.
- Agency: The ability to act and make decisions.
- Autonomy: The ability to act independently to achieve a specific goal.
Games as Training Grounds: Games offer advantages for training AI agents:
- Diversity and Scalability: They provide diverse environments and can be easily scaled for parallel training.
- Safety and Control: The virtual nature of games provides a safe space for experimentation.
- Human-Agent Interaction: Multiplayer games facilitate understanding human-agent interactions.
From DQN to SIMA: DeepMind’s Journey: The episode highlights key milestones in DeepMind's agent research:
- DQN (Deep Q Network): An AI agent trained to play Atari games using only pixel data and reward signals.
- AlphaGo & AlphaZero: AI agents mastering the complex game of Go, with AlphaZero achieving superhuman performance without human data.
- AlphaStar: An AI agent excelling in the real-time strategy game “StarCraft II,” showcasing adaptability to partial information environments.
- SIMA (Scalable Instructable Multi-world Agent): The current project focusing on training agents to follow instructions in diverse sandbox games.
SIMA’s Approach: Imitation Learning & Sandbox Games: SIMA employs imitation learning, training agents to mimic human actions in sandbox games like “Goat Simulator 3” and “Valheim.” This differs from previous reward-driven approaches.
Promising Results & The Path to AGI: SIMA demonstrates promising results:
- Cross-Game Transfer Learning: Agents trained on multiple games perform better in individual games than those trained solely on that game.
- Generalization to Unseen Environments: Agents trained on a set of games show the ability to perform tasks in a held-out, previously unseen game.
- The Quest for AGI (Artificial General Intelligence): SIMA’s ultimate goal is to contribute to building truly general-purpose AI agents capable of performing at human levels across diverse tasks.

Key Ideas and Facts

Agents with Language Understanding: The integration of large language models (LLMs) allows agents to understand and follow instructions given in natural language.
Beyond Short-Horizon Tasks: Research is ongoing to extend SIMA’s capabilities beyond short instructions, enabling agents to complete more complex, long-horizon tasks.
The Importance of Generality: A key focus is developing agents that can generalize knowledge and skills across different environments, a crucial step towards AGI.
Potential Real-World Applications: The insights gained from SIMA could contribute to developing AI agents for various real-world applications, such as self-driving cars, robotic assistants, and personalized online tools.

Noteworthy Quotes

Defining Agents: “An agent is a very general concept. And I define it as an entity that can act in an environment. I think action is what defines an agent.” - Frederic Besse
Games as Research Platforms: “We think it’s a great proving ground to train AI systems to match human performance.” - Frederic Besse
Benefits of Sandbox Games: “So it’s a perfect sandbox game because you can do so many different things.” - Frederic Besse on “Goat Simulator”
Imitation Learning: “We just imitate humans, which-- yeah.” - Frederic Besse
Sparks of AGI: “Well, it’s very difficult to say, but we do hope, yes, that the methods we develop will help to make progress towards AGI.” - Frederic Besse

Conclusion

The episode provides a compelling glimpse into DeepMind’s ongoing research on AI agents. While still in its early stages, SIMA’s approach of combining imitation learning and diverse game environments offers a promising path towards developing more general and capable AI systems. The project’s ultimate aim is to contribute to the quest for AGI, a goal that, if achieved, could revolutionize countless aspects of our lives.

In no particular order, here are 10 use cases from the Google post that stood out to use for 1) their specificity and 2) how they leverage capabilities that sit squarely within the jagged frontier.

Formula E can now summarize a two-hour long race commentary into a 2-minute podcast in any language, incorporating driver data and ongoing seasonal storylines.
Volkswagen of America built a virtual assistant in the myVW app, where drivers can explore their owners’ manuals and ask questions, such as, “How do I change a flat tire?” or “What does this digital cockpit indicator light mean?” Users can also use Gemini’s multimodal capabilities to see helpful information and context on indicator lights simply by pointing their smartphone cameras at the dashboard.
Best Buy can generate conversation summaries in real time using Contact Center AI, allowing live agents to give their full attention to understanding and supporting customers, resulting in a 30-to-90-second reduction in average call time and after-call work. Both customers and agents have cited improved satisfaction.
Warner Bros. Discovery built an AI captioning tool with Vertex AI and saw a 50% reduction in overall costs, and an 80% reduction in the time it takes to manually caption a file without the use of machine learning.
Bristol Myers Squibb is transforming its document processes for clinical trials using Vertex AI and Google Workspace. Now, documentation that took scientists weeks now gets to a first draft in minutes.
PODS worked with the advertising agency Tombras to create the “World’s Smartest Billboard” using Gemini — a campaign on its trucks that could adapt to each neighborhood in New York City, changing in real-time based on data. It hit all 299 neighborhoods in just 29 hours, creating more than 6,000 unique headlines.
The U.S. Dept. of Veterans Affairs is using AI at the edge to improve cancer detection for service members and veterans. The Augmented Reality Microscope (ARM) is deployed at remote military treatment facilities around the world. The prototype device is helping pathologists find cancer faster and with better accuracy.
England’s Football Association is training Vertex AI on the FA's historical and current scouting reports so they can be transformed into concise summaries, helping national teams discover future talent.
Bayes Impact builds AI products to support nonprofits, and its flagship product, CaseAI, is a digital case manager that integrates with an NGO’s current system to add smart features to draft action plans tailored to a beneficiary’s unique history; caseworkers have saved 25 hours of work per week on average.
Ipsos built a data analysis tool for its teams of market researchers, eliminating the need for time-consuming requests to data analysts, which is powered by Gemini 1.5 Pro and Flash models as well as Grounding with Google Search to enhance real-world accuracy from contemporaneous Search information.

Confluence: AI, Leadership, and Communication

Discussion about this post