Confluence for 3.2.2025

ChatGPT Deep Research now available for Plus users. Amazon announces Alexa+, powered by Claude. How one AI insider uses LLMs. Our current model stack.

Mar 02, 2025

Midjourney prompt: *Mexican mural of Prometheus reaching upward toward digital flame above corporate buildings, bold outlines, Rivera-inspired, powerful composition —ar 4:3*

Welcome to Confluence. It has been very busy in the world of generative AI, with substantive new models from OpenAI, Anthropic, and xAI in just the past few weeks. We have a lot to talk about — more than we can get to all at once. That said, here’s what has our attention this week at the intersection of generative AI and corporate communication:

ChatGPT Deep Research Now Available for Plus Users
Amazon Announces Alexa+, Powered by Claude
How One AI Insider Uses LLMs
Our Current Model Stack

ChatGPT Deep Research Now Available for Plus Users

The frontier continues to get cheaper and more accessible.

Less than a month ago, we wrote about ChatGPT’s Deep Research. It stunned us. We felt that we had crossed another threshold in generative AI capabilities as we now had “an agent that [we] can assign to research and report on a topic, leave for a while, and come back to have a piece of work that is equal to or above most human standards.” The one catch to Deep Research was that it was only available to ChatGPT Pro users who paid $200 per month.

This is no longer the case. This week, OpenAI made Deep Research available to all ChatGPT Plus, Team, Enterprise, and Edu users. Users at these tiers receive 10 queries per user per month, which is plenty to get a sense of how powerful this tool can be. We think of it this way — for $20 per month, you can get 10 customized research reports that are at or above the level many humans would produce, and it will only take 10 minutes or so to generate each. Even better, you can ask for these reports whenever you want, on any topic that catches your interest, without pulling anyone away from their other work. It makes it much easier and less costly, in terms of time, energy, and attention, to explore whatever topics you choose, even if it’s a passing interest.

If you’re skeptical about exactly how smart Deep Research is, Timothy Lee conducted an experiment that is worth your time to read. He asked 19 of his readers, who work as teachers, attorneys, engineers, and more, to use Deep Research and evaluate its output. Of these 19 professionals, 17 believed the output to compare favorably to at least entry-level workers in their field with seven comparing it favorably to the work of experienced professionals. And it produces these outputs in minutes, as opposed to hours or days.

We do our best at Confluence to point you to the developments in generative AI that have our attention (and we believe should have yours). If you haven’t used Deep Research yet, we highly suggest testing it now. We believe you’ll be stunned, as we have been.

Amazon Announces Alexa+, Powered by Claude

A new opportunity for mass adoption of generative AI.

This week, Amazon unveiled Alexa+, its next-generation virtual assistant powered by generative AI through a partnership with Anthropic and based on Anthropic’s very powerful Claude large language model (the model we use most day-to-day). Amazon forecasts the new Alexa will engage in “expansive and natural” conversations, and that it will be more capable of helping users accomplish daily tasks. Amazon plans to roll out Alexa+ in the U.S. in the next several weeks during an early access period (you can request to be considered for early access here), prioritizing owners of more sophisticated Alexa devices. The service will be free to all Amazon Prime members, and will cost non-members $19.99/month. The planned rollout will potentially bring advanced AI capabilities to millions of households in the very near future.

This integration could foster a new level of public understanding of AI’s practical value. When Amazon first launched Alexa in 2014, it created a new product category that quickly entered millions of homes. Today, over 600 million Alexa devices operate worldwide, with many users incorporating them into daily routines—checking weather, playing music, controlling other smart devices across the home. The established presence and familiarity of Alexa provide a unique platform for AI adoption requiring minimal behavior change or interface-learning from users. Should Anthropic and Amazon execute well, this partnership could demonstrate AI’s capabilities in an approachable way that resonates beyond the tech-savvy crowd.

But that’s a considerable “if.” We wrote last year with optimism about Apple’s Apple Intelligence putting the power of AI in pockets across the world — but the technology has disappointed. A quick Google search for “why is Apple Intelligence so bad” produces pages of explainers trying to answer that question.

We’re not assuming Alexa+ will immediately revolutionize generative AI adoption. That said, the development deserves attention. Amazon claims to have created technical breakthroughs including “experts”—systems, capabilities, and instructions for specific tasks—and “agentic capabilities,” letting Alexa navigate the internet independently to complete tasks. The real test for Alexa+ lies not in Amazon’s ambitious presentation, but in whether it delivers genuine utility in living rooms when the marketing spotlight fades. We’ll be watching with interest.

How One AI Insider Uses LLMs

OpenAI founding member Andrej Karpathy’s new video provides a valuable overview of the current state of tools and capabilities.

We first pointed Confluence readers to one of Andrej Karpathy’s videos in November of 2023. The video in that case was his one-hour “Intro to Large Language Models” talk, which showcased Karpathy’s ability to translate his deep expertise into accessible explanations for non-technologists. In recent weeks, he’s published two new videos, again aimed at a general, non-technical audience. This week we watched one of them, “How I Use LLMs”, and recommend that anyone looking to catch up or go deeper on the current state of LLM tools and capabilities do the same. It’s long, at just over two hours, but if you break it into chunks (as we did), it’s manageable.

Perhaps the greatest strength of the video is its practical focus. It’s “an example-driven, practical walkthrough of large language models and their growing list of related features.” Much like we do in our seminars and in sessions with clients, Karpathy spends nearly the entire video demonstrating these tools and capabilities rather than explaining them. Equally valuable is Karpathy’s coverage of a wide range of tools, including ChatGPT, Claude, Gemini, Grok, Perplexity, NotebookLM, and Cursor. Many overview videos like this focus on one specific tool, so to see Karpathy work through all of these and demonstrate their similarities and differences in real time is particularly valuable.

To provide a sense of just how much he covers, in terms of both tools and features, here’s the time-stamped list of segments in the video:

The “chapters” shown above move from simpler to more complex and sophisticated as the video progresses, which is another strength of Karpathy’s approach. Over the course of the video, Karpathy builds a visualization of the current state of LLMs, which eventually results in the image below. At the beginning, the canvas is blank. In each chapter, Karpathy adds a new piece to this, providing accessible descriptions and demonstrations of each as he does.

Screengrab from Andrej Karpathy’s “How I Use LLMs” (YouTube)

For anyone looking to get up to speed, not just on the latest tools and capabilities but on how they work and how they compare to each other, Karpathy’s video would be two hours well spent. This video will almost certainly have a limited shelf-life — in fact, Claude 3.7 was released while Karpathy was filming the video, and GPT-4.5 was released the day after its publication. For now, though, it’s arguably the most comprehensive, accessible overview of the current landscape that we know. And speaking of how quickly things are moving…

Our Current Model Stack

Things are changing fast, but here’s what we’re using, and for what, across today’s leading generative AI models.

Over the past few weeks it seems like powerful new generative AI models have been flying off the shelves like … whatever it is that flies off the shelves in your part of the world. Even we are having difficulty keeping up, and we have a lot of work to do to figure out where each model excels. That said, here’s the current stack of models we use daily for particular purposes, in case it’s of help. Your usage may vary.

Our daily driver: Claude Sonnet 3.7 (Anthropic). This is our go-to. We use it for analysis, writing, thought partnership, coding — just about everything other than the specific use cases below. We’ve preferred Claude for over a year now, and Sonnet 3.7 remains our top choice for most of our work. Turning on its “thinking” option brings reasoning to its responses, making them even more powerful. Sonnet 3.7 is amazing.

Research reports: OpenAI Deep Research. This tool creates long briefing documents or research reports (five to 20 pages or more, often with over 100 citations) on specific topics. It’s not yet truly reliable at academic literature review, as much of the literature remains behind paywalls, and it can over-index on some sources at times, but if you want to get reasonably smart about something in a short amount of time, this tool is amazing. Be sure to verify any facts that may cause reputational risk if they’re wrong, and ask another model to critique the output to be sure you have a balanced perspective. Example query:

Research the concept of resilience. Consider what the published academic literature has to say about resilience in organizations, teams, communities, and individuals. Apply those findings to a leadership context, extrapolating reasonable conclusions that leaders should draw about fostering resilience in the people they lead in organizations and professional settings. Avoid cliches, tropes, and consultant mythology. Ground your search in what the literature has to say about the best ways to foster resilience in organizations, teams, and people. Pay particular attention to specific behaviors, routines, and actions that leaders can incorporate into their daily practice to foster resilience in others.

Deep thinking from a brilliant colleague: OpenAI o1. We use this model when we want very deep and particular thinking on a topic. Why “particular?” We’ve come to a view that large language models are a form of alien intelligence. We’re not saying they are sentient, but they clearly think differently than we do (in fact, nobody really knows how they think), and they are clearly intelligent. o1 is “very much so” on both of these dimensions. So we think of it as brilliant colleague who sees things differently than we do. We don’t give it a lot of queries, but when we have a problem on which we’d like a smart and different perspective with a lot of reasoning behind it, o1 is our go-to. Example queries:

Research the time management practice of time boxing or time blocking. Consider how Cal Newport and Oliver Burkeman use this strategy. Offer specific strategies for how to apply it in my life, knowing that my calendar is often pre-planned in advance by my Executive Assistant. Consider the effects of the planning fallacy. Consider how I could best implement this knowing that my primary planning tools are Outlook Calendar and a remarkable 2 digital notebook. Make your thinking practical. As context, note that I lack deep work time, and I sense that I don't focus as much as I should on priorities and big rocks day-to-day. Ideal outcome is better throughput of priority items at work and home. I have blocks of time I can protect. I use the remarkable for notes and reflection, but can also use it for planning. Tasks are kept in a set of next action categories in a to-do app (for now at least).

Write your best case study to create a conversation about judgement among a group of mid-career strategic communication consultants in our firm. Don’t be beholden to popular forms like that of Harvard B School. Use your own intellect to create the content and design however you see fit with the goal of fostering a credible dialogue in which our consultants develop their professional judgment.

Image generation: Midjourney. Other image generation tools are out there, but for us, Midjourney is still the best tool going. We use it to create the header images in Confluence, and now that you can use it as a website and not via Discord, there’s no reason not to use Midjourney. Like any generative AI tool, what you get depends a lot on how you prompt, so spending some time to learn its ins and outs is helpful. We’ve even created a Claude Project that brainstorms image ideas and prompts for us. Example prompt (and the output):
```
Mark Rothko-inspired abstract expressionist painting, horizontal rectangular color fields transitioning from earth browns at bottom to vibrant reds and oranges at top, subtle digital texture embedded in the color transitions, corporate silhouettes at bottom edge, 4:3 aspect ratio, oil painting texture --ar 4:3 --v 6
```

Updates on current events: Grok 3. Not a lot of people are talking about it, but Grok 3 is a surprisingly good model, and it’s amazing if you consider that X developed it in months. It’s definitely different from the other models, in part because just about anything goes on Grok. It can be funny, weird, crude … and very effective, especially at summarizing current events as it is a reasoning model that searches both the web and X. Again, verify any facts you plan to cite. Example query (and in this case, output, as we have not written much about Grok—in creating its response, Grok thought for 18 seconds and read 11 X posts and 15 web pages):

Us:
The NASCAR race at COTA is today. Bring me up to speed on what's new with it over the past 24 hours.

Grok:
The NASCAR Cup Series race at the Circuit of the Americas (COTA) is happening today, Sunday, March 2, 2025, at 3:30 p.m. ET. Here’s a rundown of what’s new and noteworthy over the past 24 hours as you get ready for the event.

New Track Layout and Race Format

This year, NASCAR has switched to a shorter version of the COTA circuit, dubbed the "National" circuit. It’s 2.4 miles long with 20 turns, a big shift from the full 3.4-mile layout used in previous years. The new design eliminates the long straightaways and introduces a sharp, acute left-hander at Turn 6A, which should open up more passing zones—and potentially lead to some on-track chaos. To adjust for the shorter track, the race has been extended to 95 laps, up from 68 laps in prior years. This means more racing action for fans and a fresh strategic challenge for teams.

Key Driver and Team Updates

William Byron, last year’s winner, is chasing consecutive victories at COTA but will start from 15th today.

Tyler Reddick, the 2023 COTA winner, grabbed pole position and looks like a top contender.

Shane van Gisbergen, a road course ace and three-time Supercars champion, has been fast in practice and is pegged as a favorite.

Connor Zilisch, an 18-year-old phenom, is making his NASCAR Cup Series debut. He’s already won big races like the Rolex 24 at Daytona and the 12 Hours of Sebring, so all eyes are on him.

On the team front, 23XI Racing dominated qualifying, locking out the front row, while Trackhouse Racing has three cars in the top 10, promising a fierce fight up front.

Technical Changes to Watch

Repaved Track Surface: Sections of the track, particularly Turns 2-8, have been freshly paved. This could shake up grip levels and tire wear, making adaptability key.

New Goodyear Tires: A new tire compound is in play this year, adding another layer of intrigue to race strategy. Teams will likely lean toward a three-stop strategy, though some might risk a two-stop gamble with the longer race.

Fan Experience and Weather

NASCAR’s keeping fans entertained with extras like dance troops from America’s Got Talent and pig races.

Weather’s looking ideal: sunny skies, 70s temperatures, and just a 5% chance of rain.

Future Uncertainty

There’s a question mark hanging over COTA’s NASCAR future. The rental deal between Speedway Motorsports and COTA expires this year, and while a 2026 return is anticipated, nothing’s locked in yet.

With a revamped track, a longer race, and a stacked field of veterans and rising stars, today’s event is gearing up to be a wild one. Enjoy the show!

There are other models that are good at video generation (Veo 2 from Google), and music (Suno AI), but we don’t use them much day-to-day. We don’t know much about where all this is going, but we do know this: our list will be different in six months. So we will continue to post this list from time to time.

We’ll leave you with something cool: Beyond the technical benchmarks, there are some at Anthropic who are measuring Claude’s capabilities in a different way … how far it can get in playing Pokémon Red. You can even watch a livestream of Claude playing the game on Twitch.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

Confluence: AI, Leadership, and Communication

Discussion about this post