Confluence for 8.11.24

Here comes the authenticity storm. LLMs as graders. On the metacognitive demands of working with genAI. S&P Global's AI training push.

Aug 11, 2024

Midjourney prompt: *Atari 2600 style Olympic hurdles in Paris, blocky pixels, limited color palette, simplified Eiffel Tower, stick figure athletes*

Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI and corporate communication:

Here Comes the Authenticity Storm
LLMs as Graders
On the Metacognitive Demands of Working with Generative AI
S&P Global’s AI Training Push

Here Comes the Authenticity Storm

The only way to have source credibility is to earn and then keep it.

The photo at the center of this post at Snopes (if you don’t know Snopes, it’s an online resource for validating or debunking urban myths of all sorts) has been getting a lot of attention online over the past few days. We won’t post the photo here, but you can visit the link and see for yourself.

Some people online are saying it’s generated by AI. Others are saying it isn’t. Snopes says they can’t say. We won’t wade into it either way, and the questioning and debate are clearly driven in part by political interests. But the question is, how could you tell? Just a year ago there were many tells: people would have three hands, or four fingers, or the faces would be distorted, or things would be of odd proportions. But the image generation quality now is getting so good that one wonders, can you tell?

Here are two images. One was shot by one of our authors, the other created by Midjourney. Which is “real?”

The one on the left is a real photograph. The one on the right was created with Midjourney. The cup, table — all of it — does not exist.

Or how about this video, generated from an AI-generated photo, which has been circulating on X the past few days (this is a screen recording because Substack won’t allow us to embed X videos … follow the link to see the higher-quality original):

Completely AI-generated. What’s true now for photography and imagery will soon be true for video. (And if you want some guidance on how to tell if images are generated by AI, we’ve created a primer on that with Perplexity here.)

So what to make of it? We’ve been saying since the start of Confluence that the day will come when employees will be much more apt to question the source and credibility of what they see and hear from the organizations in which they work. In some ways, this is already true today: most employees don’t believe their senior leaders write all their speeches or tweets. In fact, when a CEO does write their own tweets or LinkedIn posts, it can often make them more credible. But even when it’s assumed a ghostwriter is involved, most employees don’t question the authenticity of most of what they see or hear from “official” organizational sources.

But very soon we think employees will be conditioned, as a first reaction, to question the authenticity of what they see or hear. And this questioning is going to extend to many, many domains outside of work: homework, telephone solicitations, emails, political images, audio recordings, music, art — to name but a few.

In some ways, this is sort of a dark thought. At least for us, we don’t relish a time where someone’s first reaction is to distrust rather than trust. But that doesn’t mean that the reaction isn’t a rational one to have, especially after people have been convincingly if innocently fooled a few times, like with the examples above.

The only way to counter that reaction is with proof (someone I trust was there in person or I saw it with my own eyes) or proxies for proof (watermarks for images, for example). This really isn’t a new problem. It’s been present in art and currency for ages. It’s been present with the internet since the beginning, it’s just not been something many people think about — do you really have source authenticity for the things you read online? How do you know that person publishing that website is who they say they are? How do you know what they are saying is accurate? The fact is that, in most cases, we don’t.

In the long term, technology will likely provide means for proof of authenticity. We have special paper and holograms for currency, and we have carefully established provenance for art (and in fact, this is what the blockchain does today in providing a distributed and permanent ledger). In the past few weeks, it became known that OpenAI for some time has had a form of embedded watermarking technology available to allow the identification of text generated by ChatGPT. So eventually, we will develop ways to trust. But between now and then, it’s going to be problematic. The issue isn’t just about “disinformation” or “misinformation,” it’s about “real” or “not real.”

So what do you do in the short term? You disclose the use of generative AI in any and every instance in which it may be disquieting for others to learn you have used it. Establish a reputation for always being clear about when you have and have not used the technology and make that disclosure, and your reputation for consistent disclosure, your watermark. As with anything else, your track record of openness and consistency will be the source of your credibility. Here are examples of what we use:

Generative AI was an editorial and proofreading resource in the creation of this content. All use protected client confidentiality.
Generative AI was a resource in creating select content in this document. All content has benefited from human review and revision, and all use protected client confidentiality.
Generative AI was used as a secondary research resource in identifying or summarizing literature for this document. All use protected client confidentiality.
Generative AI was used as a qualitative analytic tool for this work. All output was subjected to human review and verification, and all use protected client confidentiality.

But here’s a question: should you disclose if you use AI to proofread? Of course not. We’ve used spellcheck for years, and nobody discloses that. Should you disclose if you use AI to help you brainstorm ideas? Probably not. We talk to colleagues to brainstorm all the time, and we don’t disclose that.

But the “I used a colleague” angle is an important frame of reference for all of this, because the real question about authenticity with content isn’t just a question of “is it real or not,” it’s also a question of, “Did you do that, or did someone or something else do that?” And society and organizations have long-standing and well-honed standards for when we should represent our work as our own. None of us with any ethical compass would ever take credit for work that someone else had done. We have a word for that: plagiarism.

So our advice is to think of generative AI as a colleague rather than a tool when making these decisions. If you wouldn’t take credit for someone else doing part of the work, don’t take credit for AI doing that part of the work. If nobody would ever wonder or care if a colleague had done something, you probably don’t need to worry about disclosing that generative AI had done something (unless you’re charging a fee based on human time spent, in which case, you should adjust your fee). But in the end, the litmus test is the question, “Would I, or my team or my organization, be embarrassed if it were known how we used AI in this work?” If the answer is “Yes,” or even close to “Yes,” you should disclose.

We are optimists about generative AI, and we try to be optimistic in this space. But you should probably expect the authenticity problem to get worse before it gets better. The good news is that you can be a great steward of your own credibility as things evolve — making your track record your watermark — and we hope this post offers some guidance in how to do so.

And for the record, a human wrote every word of this post. AI gave it a few proof-reading passes. We feel good about that, and presume you do, too.

LLMs as Graders

New data compare how LLMs stack up against human evaluators of student work.

A few weeks ago, we explored a Wall Street Journal piece on AI in education, noting how educators are struggling with AI’s role in evaluating student work. The article relied on anecdotes and was scant on empirical findings. Now, a new study gives us data that can help inform this conversation, shedding light on how large language models (LLMs) perform when grading short-answer responses.

The findings? LLMs, specifically GPT-4, performed on par with human graders across various subjects and grade levels. This result alone is noteworthy, but the implications run deeper.

First, the researchers found a substantial performance gap between GPT-3.5 and GPT-4. While GPT-3.5 lagged behind human graders, GPT-4 matched them. This stark contrast underscores a crucial point that we find ourselves making time and time again — the specific AI model you use matters enormously. It’s not just an incremental difference, and staying on top of this is arguably the most important thing you can pay attention to when working with AI.

Perhaps more interestingly still, the study found only a small performance difference between zero-shot prompting (simply asking the model to evaluate answers) and few-shot prompting (first providing examples of correct answers). A growing view among experts suggests that prompt engineering’s importance may diminish over time. This finding reinforces an emerging reality — while skillful prompting can improve performance, it’s unlikely to dramatically extend a model’s inherent capabilities.

At the end of the day, as we noted in our WSJ article discussion, the question isn’t just whether AI can match human performance on a specific task. We must consider how AI fits into the broader process of development and improvement. In education, grading isn’t just about assigning scores — it’s about making students better. Similarly, in our work, AI’s goal isn't merely efficient output production. It’s about making our people better at what they do.

On the Metacognitive Demands of Working with Generative AI

The new cognitive demands generative AI places on users are real, but you can minimize them.

“It takes me so long to figure out how to do what I need to do with AI that I wonder if it’s worth the time at all.”

“I feel like I’m constantly second-guessing whether I can trust what the AI gives me.”

“I’m never sure when I should use AI and when I should just do the task myself.”

If any of these sound familiar, you’re not alone. When we lead conversations with clients or colleagues about generative AI, we hear variations of these themes again and again. The statements all point to the same aspect of working with generative AI: the need for a special kind of mental effort to “think about our thinking” to get the most out of these tools.

The proper term for this effort is metacognition, and a recent paper titled “The Metacognitive Demands and Opportunities of Generative AI” by Lev Tankelevitch, Viktor Kewenig, and colleagues sheds light on this phenomenon in the context of generative AI. The researchers argue that interacting with generative AI systems imposes significant “metacognitive demands” on users — requirements for monitoring and controlling our own thought processes as we work with these tools. These demands are exactly what our clients are articulating when we hear statements like the ones in the opening of this post. The paper points to three categories of metacognitive challenges:

Prompting: This involves not just writing a prompt, but being aware of our goals, breaking down tasks into manageable parts, and continuously adjusting our approach based on the AI’s responses.
Evaluating and relying on outputs: Once we get a response from an AI, we need to assess its quality and relevance. This requires confidence in our own expertise and judgment, as well as an understanding of the AI’s capabilities and limitations.
Automation strategy: Perhaps the most overarching challenge is deciding when and how to incorporate AI into our workflows. When do we use these tools, and when do we not?

These three categories ring true for us and go a long way toward explaining why, despite the power of generative AI, many users find themselves frustrated or uncertain about how to effectively incorporate these tools into their daily work. So, what can we do about it? In our view, there are several near-term strategies we can employ to minimize these metacognitive demands.

The first is to develop a basic understanding of how these tools work. You don't need to become an AI expert, but knowing the fundamentals can help you set realistic expectations and use the tools more effectively. When you know how these tools work, you’ll be less likely to get yourself into sticky situations and more likely to use these tools where they really excel.

Next is to learn some basic best practices for prompting and use. Familiarizing yourself with these can significantly improve your effectiveness in working with generative AI tools. You might start with our most recent prompting guidance, which you can find here.

Third, and perhaps most important, is to simply spend time using the tools. Like any skill, working with AI improves with practice. The more you use these tools, the better you’ll become at navigating their quirks and capabilities. The rule of thumb that we’ve adopted from Ethan Mollick is to spend 10 hours using these tools for either personal or professional use. If you haven’t done this already, you may be surprised by the dividends this pays in terms of developing a sense of “feel” or intuition for working with generative AI.

In the longer term, we expect many of these metacognitive demands to be addressed through technological advancements. Future AI systems will have better memory capabilities, allowing for more coherent, context-aware interactions. They may also develop a more nuanced understanding of user intentions, reducing the need for precise prompting. Other innovations we can’t yet foresee will likely emerge to make human-AI interaction more seamless and intuitive. The metacognitive demands imposed by generative AI tools will never go away, but they will probably never be as high as they are today. Remember (and we’re referencing Ethan Mollick again here): today’s AI is the worst we will ever use.

S&P Global’s AI Training Push

The financial giant is betting big on AI skills for its workforce.

This week, S&P Global and Accenture announced a strategic partnership under which 35,000 S&P Global employees will receive comprehensive generative AI training starting this August. The partnership, hailed by the press release as intended to “fuel generative AI (gen AI) innovation across the financial services industry,” follows a broader trend across industries of prioritizing AI skills — just last month, we reported on JP Morgan Chase’s plans to provide all new hires with prompt engineering training.

This focus on training, rather than just rolling out AI tools, signals a shift in how companies are approaching AI adoption. Organizations are recognizing that to truly leverage AI's potential, employees need more than access — they need skills. As more companies invest in AI training, questions arise about the level of AI competence that will become essential for future professionals. Will prompt engineering become as fundamental in financial institutions as spreadsheet skills have long been? How will this shift impact hiring practices and career development across industries?

For communication teams, these developments bring new challenges and opportunities to lead within their organizations. As AI adoption intensifies, the implications for workplace dynamics, productivity, and ethics multiply. In addition to the questions of disclosure raised at the top of this issue, communication leaders should also consider how to effectively explain AI's role and impact to various stakeholders, and develop strategies for addressing concerns about AI's effect on jobs and workflows. They'll need to promote responsible AI use while facilitating collaboration between AI-trained and non-trained employees.

As we move from a world where AI is used by a handful of early adopters to one where it's broadly integrated across organizations, communication professionals will play a crucial role in shaping perceptions, guiding policies, and ensuring transparent communication about AI use. The race to upskill in AI is gathering speed, and its implications for the workforce are only beginning to unfold.

We’ll leave you with something cool: Speaking of authenticity, legendary sports announcer Al Michaels explains why he lent his voice to NBC for AI-generated Olympics recap coverage.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

Confluence: AI, Leadership, and Communication

Discussion about this post