Confluence for 1.21.24
IMF report on AI's impact on the global economy. Anthropic publishes research on "sleeper agents." Taking a walk with ChatGPT. Hands-on with Copilot for Microsoft 365.
As usual, it was a big week for AI announcements and developments, catalyzed by the World Economic Forum’s annual meeting in Davos where AI was a dominant topic. While we’ve not watched every session or talk focused on AI, one that we did find particularly valuable was a panel discussion with several prominent AI leaders on “the expanding universe of generative models,” a wide-ranging conversation on the future of AI. It’s worth a listen. With that said, here’s what has our attention at the intersection of generative AI and communication this week:
IMF Report on AI’s impact on the Global Economy
Anthropic publishes research on “Sleeper Agents”
Taking a Walk with ChatGPT
Hands-On With Copilot for Microsoft 365
IMF Report on AI’s Impact on the Global Economy
Taking a longer view on AI and inequality.
Last week, the International Monetary Fund (IMF) released a study on Artificial Intelligence and the Future of Work. Differing in focus and approach from a Wharton and OpenAI paper that we previously discussed in Confluence, this study similarly identifies that knowledge-intensive jobs are more susceptible to the influences of generative AI than other work sectors.
This finding implies that economies with a dense population of knowledge workers are more likely to benefit economically from generative AI than emerging markets or low-income countries with fewer knowledge-based jobs. The potential also exists for generative AI to increase inequality within nations, as high-earning knowledge workers who harness the power of generative AI can significantly boost their productivity and, consequently, their potential earnings.
The IMF also introduces the “AI Preparedness Index,” a tool measuring countries’ readiness to integrate AI into their economies. This index evaluates digital infrastructure, human capital, labor policies, economic innovation, and regulatory and ethical frameworks, correlating these factors with the proportion of jobs highly affected by AI. Its initial assessment aligns with our expectations — advanced economies are generally better positioned than emerging markets, which are, in turn, more prepared than developing countries.
We expect much more research on how AI will impact labor markets and economies. As we learn more, we'll continue to share our findings here.
Anthropic Publishes Research on “Sleeper Agents”
The latest on AI safety and addressing vulnerabilities in AI systems.
AI safety is a focus at Anthropic, the creators of Claude generative AI, and their latest research explores what they call “sleeper agents” in AI systems. In a blog post on their website, Anthropic tees up a provocative research question:
“Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques?”
To test this question, the team at Anthropic took isolated versions of Claude with specific backdoor vulnerabilities and tested the degree to which they could then unwind these vulnerabilities through standard fine-tuning techniques (reinforcement learning, supervised learning, and adversarial training). The short answer — they couldn’t. These backdoors, once embedded in the models themselves, proved resilient to fine-tuning efforts to address them. Anthropic published the full paper if you’d like to dive in further.
These vulnerabilities do not exist in the models we use today, but this research offers several reminders for corporate communication professionals:
There is still much we don’t know about how these models really work. Even the companies and developers working on leading large language models (LLMs) continue to learn about how they function and respond to specific conditions. As they learn more, it’s important to stay current with the latest findings so you’re working from a grounded appreciation of strengths, weaknesses, and risks.
Start to engage in cross-functional conversations about AI if you haven’t already. The benefits and risks of bringing generative AI to organizations span disciplines and functions. The best way to make the right decisions for your organization is to bring different experiences and expertise to the table.
Develop and enforce principles for generative AI use in your teams and organizations. There’s not a single person who can fully articulate all the possible use cases for generative AI. To make sure colleagues use the technology in appropriate and thoughtful ways, they need to have expectations and guardrails in place.
This research does not warrant an alarm, but it does serve as a reminder of the open questions surrounding AI safety that we’ve yet to answer. We’ll continue to monitor this space and share what we learn.
Taking a Walk With ChatGPT
The ChatGPT mobile app’s voice capability is a powerful mode of engagement.
One of the major areas of ongoing capability improvements for generative AI applications is their increasing multi-modality: their ability to process inputs and produce outputs in various modes like text, audio, and imagery. The biggest public-facing advance to date came last September, when OpenAI introduced the ability to engage with the mobile app through voice and with images. Over the past few months, our appreciation for the voice capabilities in particular has grown. It’s clear from our conversations with clients, though, that many people are not aware of, let alone are using, this powerful capability.
As you might expect, using voice to engage with ChatGPT is particularly useful when you’re on the move and either can’t use your hands or look at a screen. We began using the ChatGPT app while walking, initially out of curiosity and gradually as a unique and valuable way of interacting with generative AI. Besides its hands-free and screen-free aspects, another significant advantage to this mode of engagement — and one we initially didn’t consider — is that the entire conversation transcript is available in the app afterwards. This allows you to review or resume the conversation at any point in the future, in whatever mode you prefer. For example, if you use voice interaction to brainstorm an idea on your morning walk to the office, when you arrive at the office, you can continue the conversation on your computer through the chat interface. You could also resume the conversation later that afternoon, the next morning, or two weeks later, for that matter.
Over the course of numerous walks — and, later, drives — we’ve identified several use cases that are particularly suited to this method of using the app. We’ve found it’s most helpful to think of the app in this case as your virtual walking partner and to instruct it to take on the persona best suited to the conversation you want to have. So far we’ve identified four personas that we’ve found particularly useful:
The Guide: This is how we used the app in our initial experiments. We found that ChatGPT could be surprisingly powerful as a guide to answer questions and help you learn more about your surroundings like natural phenomena, architectural details, the historical context of your location, or anything else you come across and are curious about in your surroundings.
The Thought Partner: Engaging ChatGPT in a conversation to refine your ideas, identify blind spots, critique your arguments, or otherwise stimulate and strengthen your thinking.
The Smart Transcriptionist: Treating ChatGPT as a digital notepad for recording your thoughts, with ChatGPT playing a more passive role and providing occasional prompts to spur your thinking.
The Interviewer: Using ChatGPT to facilitate reflection and insights by playing the role of an interviewer. Depending on the type of reflections and insights you’re looking to uncover and articulate, you can further refine this persona to reflect the type of interviewer best suited to those goals.
In a practical example of thought partnership, over the course of a recent 45-minute walk one of us engaged ChatGPT in a conversation that led to the creation of configuration instructions for a new GPT we’re building for this type of on-the-go use. When we returned to our computer after the walk, the configuration instructions — and the entire conversation — were waiting for us in the app and ready to be pasted into the GPT builder. We’re refining this GPT and look forward to sharing it with Confluence readers soon. In another, one of us engaged ChatGPT over the course of a 45-minute commute as a thought partner for setting annual goals and objectives. In the course of the conversation, the AI helped the user think through a critique of 2023, brainstorm ideas for objectives and intentions for 2024, and prioritize those objectives against urgency and impact. When the drive was over, the resulting nine 2024 goals were sitting in the chat transcript in the app.
The ChatGPT voice experience isn’t perfectly seamless. Latency poses a challenge and requires continuous adjustment to the pace of ChatGPT’s processing. As a result, the conversations can sometimes feel stiff, lacking the fluidity and rhythm of natural human conversation. The strength of your internet or mobile connection can also constrain performance. Despite these limitations, this way of engaging with ChatGPT is a distinct and uniquely powerful experience, significantly different from the usual text- and screen-based interface.
As with the use of ChatGPT via the text interface, we’d advise you to converse with it in voice as you would with a colleague. Give it a try, and you’ll quickly sense how the experience is different, with its own benefits and possibilities.
Hands-On With Copilot for Microsoft 365
Our first week with the integrated AI service shows short-term limits and long-term portents.
This past week, several of us finally got some hands-on time with Microsoft’s Copilot generative AI integration with its Office 365 suite, thanks to a new pricing tier that allows adding the service to any Microsoft 365 account for $30 per user per month.1 Our initial impression? Short-term, there’s some day-to-day utility here, but we still find GPT-4 a much more helpful and powerful generative AI companion. Long-term, we think the widespread integration of this technology into the Microsoft 365 suite is going to have significant implications for workers, leaders, and communication professionals.
Let’s address the short-term first. We won’t do a full rundown on everything of which Copilot is capable — we’ll refer you to Microsoft for that. The short story is that Copilot is now a function inside Word, PowerPoint, Excel, Teams, and Outlook. In each instance it offers a number of ways to help the user with their workflow, in some ways with real utility, and in other ways, not so much.
First, in Teams, Copilot is now the home page of your Chat tab and offers a number of ways to help. You can ask it to summarize chats and meeting transcripts (which is very helpful), tell you when your next meeting with someone is, or create content. Here’s an example of it creating a set of FAQs and a summary paragraph from a document:
It does a nice job, and quickly, including revising to add more questions and answers when prompted. You can also use Copilot to create and revise content in Word. It does a functional job and can create any number of documents from scratch or from source material. Here’s an example of it creating draft talking points from a longer process document:
Note that it won’t make the text longer when asked. We’ve found more than a few of these surprises so far, and it leaves us feeling not quite sure what we can ask Copilot to do or not do. That said, in Word it can also revise blocks of text, and quickly create tables from text. We’d describe the output as functional, but also find that GPT-4 is much faster, easier to work with, and more tunable to writing style than Copilot in Word at this point.
PowerPoint may be one of the most interesting Copilot use cases we’ve found so far. It can reorganize a presentation, create new slides for a presentation, and even create long, designed PowerPoint files from scratch with a click. Here’s an example:
Note that it created a very detailed presentation, including talking points, from the source material. Also note that from a design perspective, most of these slides are not great — though by using the “Designer” tool we were able to quickly clean many of them up. We also found the AI terrible at working within our own corporate PowerPoint template. But there’s probably a lot of utility here in creating at least a starting point for further revision. We will leave the question of whether or not the world needs more PowerPoints for another time.
Finally, Outlook. Many have touted the utility of having Copilot draft emails in Outlook as a time-saver. We’ve yet to find that utility, as at least for us, Copilot does a pretty poor job of understanding context and tone. We’ve had to substantively revise almost every email it’s drafted save the short ones, which we could write quickly without it anyway. Here’s an example:
Regardless of the quality of the output, our current operating principle is not to use AI to draft correspondence. We don’t want our clients and colleagues wondering about the authenticity of our communication, so at least for now, we’ll keep drafting personal email correspondence by hand.
Much more interesting to us is the Copilot “coach” feature in Outlook, which uses Copilot to critique and recommend changes to email text. Here’s an example in which we’ve given it some intentionally terse and not-very-relational text:
Note that Copilot reviews the email and gives advice for tone, sentiment, and clarity (in this case, general advice, but we’ve also had it suggest specific alternate language). We then use Copilot to revise the text based on that coaching, and it does a nice job. Email is a lean medium prone to misinterpretation. We can see the Copilot coach in Outlook saving people from creating unnecessary confusion in the least, and possibly improving or saving relationships at best. Truly, the ability to have a moment-to-moment source of feedback on your communication style (at least via email) is a powerful idea. As skilled as we think we are in our work, we’ll be using the coach on a daily basis to give a second perspective on our email correspondence.
With those examples behind us, here are some of our broad reactions to Copilot. First, it has a lot of limitations — for now. But this is the worst generative AI we will ever use, and we expect Microsoft to rapidly innovate and improve these tools. The utility they bring will only increase. One of these innovations is already on the horizon: Copilot Studio, which will allow users to create specific use cases for Copilot (similar to how one can create custom GPTs in GPT Plus), but with deep integration into the Microsoft 365 dataset.
Second, Copilot (and similar tools, including GPT-4) bring risk along with their utility. One risk is users putting too much faith in the content it creates — it’s still a large language model, and it can still hallucinate, make things up, and get things wrong. Someone needs to quality-check those FAQs, talking points, and PowerPoints — and we expect people will have a tendency to not do so. In the automation literature, this willingness to put too much faith in the technology is called “falling asleep at the wheel,” and there’s significant potential to do so with AI-generated content. As we’ve noted before, the work doesn't really go away as much as it shifts from creation to assurance.
Finally, and perhaps most important in the larger scope of things, is that we see in Copilot the portents of a significant shift in task and labor in corporate communication. A significant part of the work communication professionals do is content creation — FAQs, talking points, press releases, PowerPoint slides, product descriptions, blog posts, tweets, articles, and more. Some of this work is very nuanced and requires significant taste, experience, and discretion to get right. But much of it is far more mundane, and this more mundane content is work that large language models can increasingly create with ease.
So far most employees have not used these model to create content because they either have yet become facile with something like GPT-4, or their organizations don’t allow them to do so. But as Microsoft continues to integrate Copilot more deeply into its 365 and Office suite and make it more affordable (and we believe it will do so — in fact, we believe at some point it will be part of the standard offering), corporate communication professionals will increasingly take advantage of these tools in creating content. It won’t be long, though, before their internal clients (teams, leaders, and other employees) realize they can create much of this mundane content themselves. The role of content creation will shift from the professional, to the professional using the AI, to the internal client using the AI. This will have profound implications for governance, task and role definitions, skill development, and organizational design in our space. This is coming quickly. The time to start thinking about it is now.
We’ll leave you with something cool (if a bit creepy): a new Photomaker application that can put your face on just about anything …
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
Prior to this the Microsoft Enterprise-level of Copilot required a minimum of 300 users — a price too steep for us and many teams and small organizations.