Confluence for 11.2.25
AI 101: next token prediction. Beyond next token prediction: how reasoning models work. Understanding the AI boom. Editing as an essential skill.
Welcome to Confluence. Here’s what has our attention this week at the intersection of generative AI, leadership, and corporate communication:
AI 101: Next Token Prediction
Beyond Next Token Prediction: How Reasoning Models Work
Understanding the AI Boom
Editing as an Essential Skill
AI 101: Next Token Prediction
It’s technical, but you need to know what it is and understand what it means.
Several times over the past few weeks one of your authors has asked a large group of very senior leaders, “Raise your hand if you know what next token prediction means.” Across nearly 90 people from leading organizations, two hands went up. That’s a problem. Understanding next token prediction, and what it means to how you and your organization use large language models, is essential to using them well. So today we’ll offer a primer.
Large language models like Claude, ChatGPT, and Gemini are “generative” technologies, not “deterministic” technologies. This makes them fundamentally different from most of the ways we have traditionally used computers.
Deterministic technologies produce the same output every time you give them the same input. A calculator always returns 4 when you enter 2+2. A database query for “customers in Florida” returns the same list each time (until the data changes). Google always serves up websites stored in its database of websites. Excel formulas, accounting software, search algorithms all execute fixed logic. You can predict exactly what they’ll do because they follow explicit rules.
Generative technologies produce variable outputs from the same input, creating new outputs based on a set of probabilities and variables. Ask a generative technology to do something twice and you’ll get two different outputs. The system isn’t executing predetermined logic, it’s sampling from a set of probability distributions.
Large language models are generative (indeed, “GPT” stands for “Generative Pre-Trained Transformer”). They create text, music, images, video, etc. from the ether, not by extracting some set of content from a giant repository of all the world’s knowledge, music, images, etc. They do this through a massive amount of machine learning and something called next token prediction.
These models are (in very simple terms) autocomplete tools. Very (very, very) simply stated, as the system generates a series of words, it’s guessing the next bit of text that should come next in the sentence (or which pixel should be next to the prior pixel, in the case of images). These bits have a name: “tokens.” For a variety of complicated technical reasons, the systems don’t predict full words, but sets of characters, usually three to seven characters long. These slices are tokens. Common words like “the” or “is” are single tokens. Longer words become multiple tokens. “Understanding” becomes two tokens, “under” and “standing.” (These slices are based on the bits of letters that most often occur together, not syllables.)
The prediction comes from a huge and complex set of correlations — trillions of them — of tokens to tokens, generated by feeding the model a massive amount of text (and by massive, we mean every word of text openly available on the internet), and having it study the patterns of the words across that text. Over time it learns the likelihood that some tokens follow others. It learns, for example, that if the sentence is, “The capital of France is …” that there is a very high probability that the next token is “Paris.” Similarly, it learns that there is a very low probability that the next tokens spell “bookcase.”
With these correlative weights established and honed through extra training, large language models become the world’s most sophisticated guessing machines. They learn patterns, and get very good at predicting what comes next in the pattern. And all sorts of things are patterns: music, pixels in images, frames in video, radiology scans, contracts, newsletters, voices, and more. It if has predictable sequences, a generative model can learn it and predict from it.
But, and it’s an important “but,” these systems are still generative. They make things up a few tokens at a time based on what has come before. They are always choosing from possibilities weighted by likelihood. Even with identical prompts, the model might select different tokens, leading down different paths as they decide what comes next. Thanks to advances in technology, a large amount of context can inform “what comes next” — hundreds of thousands of words, not just a few in a sentence. This is why you can feed an LLM a set of technical reports and ask it to make sense of them. It considers all those reports as preceding context which, combined with your prompt, allows it to predict what should come next into an uncanny, thoughtful, time-saving technical summary.
So as smart as they seem, these models don’t “know” anything. They perform mathematical operations on weights to predict statistically likely continuations. Because of this they are amazing at pattern-based work, but can fail at things requiring factual accuracy or logical reasoning because next token prediction doesn’t do those things. We call these failures “hallucinations,” but you are better off thinking of them as “inventions” or “mistakes.” The model guessed wrong, sometimes spectacularly so (we have written before about judges throwing legal briefs out of court because they cited non-existent case law that a model authoritatively invented). You should not ask, “Where did it get that?” of an LLM. The answer is “Nowhere,” because they made it up a few letters at a time.
The good news is that in the past two years LLMs have gotten much more accurate, for several reasons. One is advances in how the labs build and tune the models. Another is the addition of “reasoning” to the models, in which they talk to themselves prior to responding to you (usually “thinking” through a plan, things to be careful about, etc.), and this additional context, combined with your prompt, improves their predictions. (More on that in the piece below.) Finally, almost all leading models now have the ability to connect to deterministic systems, which then return authoritative data to them to include in their responses. Almost all LLMs can now search the web, and they use a search engine (deterministic, not generative) to do so. Our installation of Claude at our firm can search our SharePoint, Outlook, and more, and what it returns is deterministic. Our AI coach ALEX searches a database of our proprietary content before constructing a response. While the models still use generative technology to craft their response, they are very good at not hallucinating authoritative data passed to them by a deterministic tool.
If you want a much deeper (and more beautiful) explanation of next token prediction, visit this graphical explorer by the Financial Times. But for now, here’s what next token prediction means for you: Always use a reasoning model. Verify all factual claims. Be sensitive to if you’re using the model for something it’s inherently good at (drafting, brainstorming, restructuring text, generating code, spotting patterns) or not. If you want to use it for deterministic work (math, analysis, search, etc.) be sure it’s using a deterministic tool and not just making up the answers. Finally, treat the output like work from a talented but unreliable colleague, work that requires expert review before you act on it. Unlike some other general purpose technologies like the transistor, you need to know a bit about how these models work to use them well and not stumble over their weaknesses. Understanding next token prediction is part of that.
Beyond Next Token Prediction: How Reasoning Models Work
An OpenAI lead researcher explains.
The piece below was written by Claude Sonnet 4.5 after a short exchange with a Confluence writer. We provided Claude with the podcast transcript as well as the item above and prompted Claude to write a 6-7 paragraph synthesis of the two. Our only edits were to add the link to the podcast, correct the spelling of Tworek’s name throughout, and correct the capitalization in OpenAI model names. Having listened to the entire podcast ourselves, we can confirm the accuracy of Claude’s writeup.
Earlier in today’s edition we explained next token prediction—the foundation of how large language models work. Now we’re building on that foundation to explain reasoning models, one of the most significant developments in AI over the past year. A recent podcast with Jerry Tworek, OpenAI’s VP of Research and one of the lead creators of GPT-5, offers an unusually clear window into what reasoning actually means and why it matters. The podcast is worth your time if you want to understand how these systems work. There’s a lengthy biographical section in the middle (Tworek’s journey from Poland to trading to OpenAI) that’s interesting but skippable if you want to focus solely on the technical concepts.
When ChatGPT or Claude says it’s “thinking,” what’s actually happening? Tworek explains it simply: the model is talking to itself before responding to you. More specifically, it’s generating what’s called a “chain of thought”—verbalizing its problem-solving process step by step using human words and concepts. If you ask a person a hard question, they rarely have the answer immediately. They work through it: “First I need to figure out X, then calculate Y, then connect that to Z.” Chain of thought is the model doing the same thing, writing out its thinking process as it goes. This is fundamentally different from pure next token prediction, where the model predicts the answer in one step. With reasoning, it creates intermediate steps—additional context that, combined with your prompt, allows it to arrive at better answers for complex problems.
Here’s a revealing moment from the podcast: when OpenAI first trained GPT-4, the team was “pretty underwhelmed internally.” Tworek recalls, “We trained this model, we spent a lot of money on it, and it’s kind of pretty dumb.” The model could answer simple questions requiring one token of prediction, but when it generated longer responses, “it wasn’t very coherent.” Pure next token prediction, even at GPT-4’s scale, wasn’t enough. What transformed GPT-4 from underwhelming to useful was reinforcement learning from human feedback—teaching the model behaviors through rewards and punishments. Tworek uses a simple metaphor: “It’s like training a dog. Whenever you see your dog behave well, you smile and give your dog a treat. Whenever you see your dog do something bad, you give your attention away.” Human reviewers scored the model’s outputs, and the model learned from these scores. “In the end,” Tworek explains, “GPT-4 plus RLHF together as a package delivered the ChatGPT moment to the world.”
OpenAI’s reasoning models have evolved quickly. o1, released in September 2024, was “really mostly good at solving puzzles,” Tworek admits—“almost more like a technology demonstration.” o3 changed that significantly, becoming meaningfully useful and capable of using tools, drawing on contextual information, and persevering toward answers. GPT-5 is essentially “o3.1”—an iteration on the same concept. But the evolution continues. Tworek notes that models can now think for 30 minutes, an hour, even two hours on certain tasks, though OpenAI is still figuring out how to make extended thinking useful for real-world problems like coding, planning, and complex analysis.
What’s striking about the podcast is how clearly Tworek articulates OpenAI’s core approach, unchanged since early 2019: “Train large generative models on all the data we can, and then do reinforcement learning on it.” This two-part formula matters for understanding what these tools can and cannot do. Pre-training (next token prediction on massive amounts of text) gives models broad knowledge and pattern recognition. Reinforcement learning shapes that knowledge into useful behaviors and adds the ability to reason through problems step by step. Neither works well alone. “RL needs pre-training to be successful,” Tworek emphasizes. “And I think pre-training needs RL to be successful as well.”
Understanding reasoning models helps you make better decisions about how to use AI in your organization. When you ask a reasoning model to solve a complex problem, it’s working through the problem step by step, creating intermediate reasoning that improves its final answer. This is why reasoning models perform dramatically better on tasks requiring logic, mathematics, coding, and complex analysis. It’s also why they take longer to respond—that “thinking” time is the model generating its chain of thought. The shift from o1 to o3 to GPT-5 over just one year shows how quickly capabilities are advancing. As Tworek notes, if you’d shown someone ChatGPT from today ten years ago, “they would probably call it AGI.” But we don’t, because we can see its limitations clearly—and because researchers are already working on resolving them.
The better you understand how these systems actually work—next token prediction as the foundation, reinforcement learning to shape behaviors, reasoning to work through complex problems—the better you can deploy them effectively and avoid their weaknesses. Reasoning models represent one of the most important developments in AI over the past year. They’ve transformed these systems from sophisticated autocomplete into tools capable of genuine problem-solving. Understanding what happens when your AI “thinks” isn’t just technical curiosity. It’s essential knowledge for using these tools well.
Understanding the AI Boom
16 charts that help tell the story.
It can be hard to make sense of what’s happening with generative AI from an economic perspective. Deals between AI labs and other technology companies are happening left and right, there’s increasing noise about a potential AI bubble, and we’re seeing buildouts of massive data centers throughout the country.
Understanding AI published “16 Charts that Explain the AI Boom” this week, offering a grounded look at generative AI’s current economic footprint rather than speculative predictions about the future. The charts examine the economy from multiple angles with concise write-ups worth reading in full, but two points stand out as particularly important for understanding this moment.
First, the biggest companies are investing heavily in generative AI, and this investment represents a meaningful part of the economy. In 2024, five major tech companies spent $241 billion in capital expenditures. In the second quarter of 2025, they spent $97 billion. At this pace, AI-related spending will exceed peak investment levels from some of the most significant infrastructure booms in modern history. We won’t speculate about what this will mean for the future, as we’ll surely be wrong, but it gives you a sense of the scale of investment we’re seeing in historical terms.
Second, people are using AI more and more. ChatGPT doubled its weekly active users between February (400 million) and October (800 million) of this year. Google went from processing 480 trillion tokens in May 2025 to 1.3 quadrillion tokens in October 2025. The growth in usage over just the past few months has been remarkable, reflecting an accelerating pace of adoption. Anecdotally, this tracks with what we’re seeing. Between conversations with clients and even with friends and family, it’s become a rarity to find someone who doesn’t use generative AI at least occasionally. It’s noticeably different from a year ago.
There’s value in understanding the broader economic picture when it comes to generative AI, even at a cursory level. The forces driving massive capital investment, accelerating adoption, and infrastructure constraints shape the context in which every organization operates, influencing pricing, capability access, and competitive dynamics. The Understanding AI piece examines additional dimensions worth being aware of, from chip imports to the financial realities facing companies like OpenAI and Anthropic. You don’t need deep expertise in any of these areas. A high-level familiarity with the broader landscape provides useful context for your decisions and helps explain what’s happening around you.
Editing as an Essential Skill
Using AI well means making choices.
Within our firm, we’ve been discussing whether and how we should prioritize editing as much as, or more than, writing as an essential skill for our team. As we wrote last week, we believe careful review will be a critical part of ensuring our AI use doesn’t become negligent. Two articles published this week help to explain why.
Howard Berg’s New York Times essay and John Nosta’s in Psychology Today share a similar and increasingly common anxiety about how AI will change how we think — or, really, how we don’t. Berg points out that “Long before ChatGPT, the smartphone and the calculator, Plato warned against writing itself,” fearing it might limit humans’ ability to memorize complex texts (he wasn’t wrong, as you don’t see many people reciting The Odyssey in its entirety anymore). For Berg, Plato’s fear of writing is quaint, because AI threatens something much more fundamental: our “capacity to think.”
But what Berg leaves out is that for Plato, the written word is really a dangerous technology because it allows language to go beyond the author and, potentially, take on a life of its own. The irony of Berg’s reference to Plato is that it highlights how good thinking and writing have long been about engaging carefully with an unruly technology, about making considered choices that hone the unwieldiness of language into a more precise (though never fully yielding) tool. This is the same kind of discernment—the kind of editorial skill—that will only matter more as we continue to engage with AI.
Nosta’s piece offers a sharper framework. For Nosta, choice is what makes meaning: “Every act of human thought narrows possibilities. We collapse uncertainty into a line of meaning.” AI does the opposite. “When a large language model responds to a prompt, it expands. It takes a single, collapsed input and broadens it into a spectrum of possibilities.” Then we, the human users, must choose what to take and what to leave. The problem comes when we mistake the expansiveness of AI-generated output for meaningfulness and let AI make all our choices for us, when we “outsource the burden of deciding.” What often connects experienced thinkers, leaders, and advisors is their ability to take on that burden — to do the hard work of making tough, thoughtful decisions. The task ahead is to ensure less experienced professionals learn to do the same, and editorial skills might be a good place to start.
We’ll leave you with something cool: Google announced it’s generative AI tool for digital marketing, Pomelli.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
