Confluence How-To: Using LLMs as Thought Partners for Image Generation
From ideation to prompt design, large language models can be valuable thought partners in the image generation process.
We use generative AI — usually Midjourney, our preferred model — to create the cover image for each edition of Confluence. We do this to add some visual appeal to our posts, of course, but also to show our readers what’s possible with these tools. To show our readers how we generate each image, we include the prompt for every AI-generated image we use. Our most detail-oriented readers may have noticed that for our three most recent editions (including this one), we’ve noted that the Midjourney prompt was generated by Claude 3 Opus. That’s because using large language models — particularly Claude 3 Opus — has become a valuable part of our image generation process.
We’ve experimented with using large language models in the image creation process for several months now. The more we’ve experimented with Claude 3 Opus, however, the more we’ve been impressed with the model’s usefulness as a thought partner in the entire AI-powered image generation process, from brainstorming and ideation through prompt development. We’ve found it useful enough to dedicate this edition to providing a bit of a “behind the scenes” look at our process in the hope that our readers will find it useful as well.
The first exchange below shows the process we used to generate the cover image for this past Sunday’s edition of Confluence. It’s an easy, efficient way to engage the assistance of an LLM in the image generation process. The entire process, from initial prompt in Claude to final image in Midjourney, took less than five minutes.
We begin by deciding on the content or concept for the image:
User: I write a substack on the intersection of AI and communication. For the cover art of our weekly post, we use Midjourney to generate an image. This coming edition's first item is about The Danger, and Power, of LLM-Backed Persuasion. Please give me 10 high-level ideas for an image we could create to evoke this theme.
Claude 3 Opus: Here are 10 high-level ideas for images that could evoke the theme of "The Danger, and Power, of LLM-Backed Persuasion":
A person with a megaphone that has an AI interface, symbolizing the amplification and reach of AI-powered persuasion.
A puppet master controlling strings attached to people, with the strings made of binary code or text, representing AI's influence over human behavior.
A person wearing a VR headset with a thought bubble containing persuasive messages, highlighting the immersive and influential nature of AI-generated content.
A Trojan horse made of circuit boards and wires, symbolizing the hidden dangers of AI persuasion techniques.
A person standing at a crossroads, with one path leading to a utopian future and the other to a dystopian one, representing the dual nature of AI persuasion.
A person holding a glowing orb with an AI interface, casting a spell on a crowd, depicting the almost magical power of AI persuasion.
A split image showing a person before and after being influenced by AI persuasion, with the "after" image showing them as a robot or clone.
A person wearing a suit covered in persuasive text and symbols, with an AI halo above their head, portraying AI as a persuasive force.
A chess board with human and AI players, with the AI player subtly influencing the human player's moves, illustrating the strategic nature of AI persuasion.
A person looking into a mirror that reflects an enhanced, idealized version of themselves, symbolizing the persuasive power of AI to shape self-perception and beliefs.
Those are pretty good options (some better than others, certainly), and Claude produced them in about five seconds. We found #2 and #5 most interesting. For efficiency’s sake, we chose #2. With the content of the image decided, we move on to the style and any artistic, compositional, or aesthetic references or considerations we want our image to evoke. We’ve found that without this step, Midjourney and other generative image models (especially DALL-E 3, which powers the image generation capabilities of both ChatGPT and Microsoft Copilot) can tend to default to what we’ve been calling a “house style,” which is vaguely cartoonish and immediately recognizable to anyone who is familiar with these tools.
User: I like #2. Who are some artists or artistic movements whose style might work for something like this? We often like to reference a particular style.
Claude 3 Opus: When considering artists or artistic movements that could inspire an image of a puppet master controlling people with strings made of binary code or text, you might look to the following:
Surrealism: Artists like Salvador Dalí and René Magritte often depicted dreamlike, irrational scenes that challenge our perception of reality. Their style could lend itself well to the unsettling nature of the proposed image.
Bauhaus: This German art school and movement emphasized geometric forms, bold colors, and the integration of technology and art. The clean lines and futuristic elements could work well for depicting the AI aspect of the image.
Expressionism: Artists like Edvard Munch and Egon Schiele used bold colors, distorted forms, and emotive brushstrokes to convey inner turmoil and psychological states. This style could help emphasize the manipulative nature of the AI puppet master.
M.C. Escher: Known for his mind-bending, mathematically-inspired drawings and prints, Escher's style could be adapted to create a complex, interconnected web of strings and binary code.
Futurism: This early 20th-century Italian movement celebrated technology, speed, and the machine age. Its dynamic, forward-looking aesthetic could be used to portray the advanced nature of AI persuasion.
Pop Art: Artists like Andy Warhol and Roy Lichtenstein used bold colors, graphic elements, and references to popular culture. This style could be used to create a striking, contemporary image that comments on the pervasiveness of AI in our lives.
Digital Art: Contemporary digital artists like Michael Takeo Magruder and Refik Anadol often use data, algorithms, and AI in their work. Their cutting-edge techniques could be used to create a truly AI-inspired piece.
Those are all fairly obvious, conventional responses, which is not surprising. Unless specifically prompted otherwise, the LLM (whether Claude 3 Opus or GPT-4) does not typically produce the most creative or striking ideas when it comes to artistic or compositional style (where it often falls back on the same themes of surrealism, futurism, and the others you see above). If we were so inclined (and we often are, as you’ll see in our next example), we could have pushed Claude to go into more obscure or unexpected territory. But for simplicity’s sake, we continued:
User: Great. Can you please generate prompts for me for each of these styles?
Claude 3 Opus: Certainly! Here are prompts for each of the artistic styles mentioned:
Surrealism: "A surreal image of a giant puppet master with a face made of circuit boards, controlling people with strings made of glowing binary code and text, set in a dreamlike, irrational landscape."
Bauhaus: "A Bauhaus-inspired image of a geometric, robotic puppet master with clean lines and bold colors, controlling people with strings made of binary code and text, set against a futuristic, grid-like background."
Expressionism: "An Expressionist image of a distorted, emotive puppet master with exaggerated features, controlling people with strings made of binary code and text, using bold colors and agitated brushstrokes to convey a sense of manipulation and inner turmoil."
M.C. Escher: "An Escher-inspired image of a complex, interconnected web of binary code and text strings controlled by a puppet master, creating a mind-bending, mathematically-inspired pattern that challenges perception."
Futurism: "A Futurist image of a dynamic, forward-looking puppet master controlling people with strings made of binary code and text, celebrating the speed and advanced nature of AI persuasion, with a sense of movement and energy."
Pop Art: "A Pop Art image of a bold, graphic puppet master controlling people with strings made of binary code and text, using vibrant colors and references to popular culture to create a striking, contemporary commentary on the pervasiveness of AI in our lives."
Digital Art: "A cutting-edge digital art image of a data-driven, AI-generated puppet master controlling people with strings made of binary code and text, using algorithms and machine learning techniques to create a truly AI-inspired piece."
Feel free to adjust these prompts as needed to better fit your vision for the cover art.
We used the Bauhaus prompt above in Midjourney, which generated the following four options:
We then prompted Midjourney to generate four alternative options of the image on the bottom left and received the following four variations:
Finally, we selected and upscaled the top-left image for our cover image in Confluence. Again, the entire process — every word of which is included in the exchange above — took less than five minutes from start to finish.
A Different Approach
For the cover image for this edition, we decided to take a different approach. We began by explaining what we needed and asking Claude for guidance on a process to follow (which, as we’ve noted before, is not a bad place to start for any use case you’re considering):
User: I am a co-writer of a Substack focused on the intersection of AI and communication. I'm responsible for using Midjourney to generate the cover image for this week's edition. We strive for our images to be visually compelling and relevant to the theme of the edition. We try to avoid cliche or conventional ideas. I know that you can be a powerful, creative thought partner, and I need your help coming up with the image. How can we best work together to do that?
Claude 3 Opus: I would be happy to brainstorm ideas with you for a compelling cover image! To make sure we're on the same page, could you provide some more context about the specific theme and topics that will be covered in this week's edition? The more I understand about the key concepts, tone, and unique angles you're planning to explore, the better I can tailor my creative suggestions to align with and enhance your content.
Once I have a clearer sense of the edition's focus, here's how I would propose structuring our collaboration:
I'll start by offering 3-4 high-level visual concepts that aim to capture the essence of your theme in an eye-catching, thought-provoking way. These will be short, 1-2 sentence descriptions of potential cover image ideas.
You can then let me know which of those initial concepts resonate with you and seem most promising to pursue. Feel free to provide feedback on what you like or dislike, and any ways you'd want to refine or build upon the ideas.
Based on your input, I'll flesh out the most promising concept(s) in more detail, drafting a couple possible text prompts you could provide to Midjourney to generate the kind of image we've brainstormed.
We can continue to iterate and refine the prompt until you feel you have a solid starting point for generating cover image options in Midjourney. I'm happy to then provide feedback and suggestions as you experiment with prompts and generated images.
The key is for this to be an iterative back-and-forth, combining my ability to come up with creative ideas with your editorial judgment and Midjourney experience. Let me know if this proposed approach makes sense, and feel free to share any other thoughts on how I can be most helpful as a thought partner in this process. I'm excited to put our minds together and come up with a powerful, imaginative cover image!
That approach sounded right to us, so we continued.
User: Sounds good. The title of the post is "Using LLMs for Image Generation". It's a practical guide to using tools like Claude or ChatGPT to brainstorm image ideas and write prompts to generate images in Midjourney.
Claude 3 Opus: Great, thanks for sharing the post title and focus. Brainstorming cover image concepts now...
"Prompts as Portals": Depict a large language model (represented by a giant brain or sprawling neural network) projecting beams of light that pass through a series of "prompt portals." Each portal shapes and filters the light, transforming it into a unique, vivid image on the other side. This represents how LLMs take simple text prompts and generate rich, detailed images through tools like Midjourney.
"The Collaborative Canvas": Show two hands (one human, one robotic) working together on a digital canvas or virtual easel, crafting an intricate, colorful image. The human hand could be typing or dictating prompts, while the robotic hand translates those words into brushstrokes and visual elements. This highlights the collaborative creativity between human prompters and AI image generators.
"Wordsmiths and Visionaries": Create a vibrant, surreal landscape where words and phrases (representing prompts) are physically interwoven with the terrain, flora, and fauna. Imagine trees with leaves made of adjectives, mountains shaped like descriptive clauses, and creatures formed from evocative nouns. In the sky, a luminous Midjourney logo shines down, illuminating and bringing life to the linguistic landscape below.
"Imagination Ignition": Picture a person (perhaps in silhouette) standing before an enormous, vintage computer console with levers, dials, and blinking lights. As they input prompts on a keyboard, the machine whirs to life, and a dazzling explosion of colors, shapes, and images burst forth from the screen, washing over the person and their surroundings. This represents the imaginative spark that occurs when human creativity meets AI image generation.
Let me know which of these concepts catch your eye, or if there are elements from different ideas you'd like to combine and explore further. I'm happy to refine and flesh out the most promising directions!
These image concepts are both more detailed and — in our view — more interesting than the image ideas provided by Claude in our first example (though that example did end up with a compelling enough image). We decided to keep our options open and move forward with all of these concepts still in play. As we did in our first example, we then pivoted to a discussion of style. Note that this time, we explicitly instructed Claude to avoid the obvious and “dig a bit deeper.” It obliged:
User: All four of these are interesting. For our cover images, we often like to evoke a certain artistic style or aesthetic. Who are some artists whose style we could reference for each of these four concepts? Let's try to avoid obvious, conventional picks (like Picasso or Dali) and dig a bit deeper.
Claude 3 Opus: Absolutely, let's explore some more distinctive and evocative artistic styles for each concept:
"Prompts as Portals":
Yayoi Kusama: Known for her immersive installations and polka dot patterns, Kusama's style could create a mesmerizing, infinity-mirror effect for the prompt portals.
Zdzisław Beksiński: His surreal, dystopian landscapes and biomechanical forms could lend an eerie, otherworldly atmosphere to the AI-generated images.
"The Collaborative Canvas":
Jean-Michel Basquiat: His neo-expressionist style, with bold colors, graffiti-like scrawling, and frenetic energy, could capture the dynamic interplay between human and AI creativity.
Hilma af Klint: Her abstract, geometric compositions and luminous color palette could create a sense of cosmic harmony and spiritual resonance in the collaborative art-making process.
"Wordsmiths and Visionaries":
Giuseppe Arcimboldo: Known for his portraits composed of objects like fruits, vegetables, and books, Arcimboldo's style could literally incorporate words and language into the fabric of the surreal landscape.
Octavio Ocampo: His metamorphic style, featuring images within images and optical illusions, could create a sense of words and prompts subtly shape-shifting into unexpected visual forms.
"Imagination Ignition":
Katsushika Hokusai: The ukiyo-e woodblock style, with its vibrant colors, dynamic lines, and sense of movement, could evoke the explosive burst of AI-generated imagery.
Julie Mehretu: Her large-scale abstract paintings, with layered gestures, geometric shapes, and architectural elements, could capture the complex interplay of algorithms and imagination in the creative process.
These are just a few possibilities, but they should give you a sense of how referencing specific artists can help refine the visual style and emotional resonance of each concept. Let me know if any of these artistic directions inspire you, or if you'd like me to suggest some other options!
From there, we asked for specific prompts:
User: Thank you. For each of these options, please write a Midjourney prompt.
Claude 3 Opus: Certainly! Here's a Midjourney prompt for each concept and artistic style:
"Prompts as Portals" + Yayoi Kusama: /imagine prompt: Gigantic neural network brain with polka dot prompt portals shining beams of light, transforming into infinite mirrored AI-generated images, in the mesmerizing immersive style of Yayoi Kusama, highly detailed, 4k resolution
"The Collaborative Canvas" + Jean-Michel Basquiat: /imagine prompt: Human and robotic hands collaborating on a digital canvas, human typing prompts, robot translating into neo-expressionist brushstrokes, bold graffiti-like text, frenetic energy, vibrant colors, in the style of Jean-Michel Basquiat, highly detailed, 4k resolution
"Wordsmiths and Visionaries" + Giuseppe Arcimboldo: /imagine prompt: Surreal landscape with trees made of adjectives, mountains of descriptive clauses, creatures of evocative nouns, illuminated by Midjourney logo in the sky, words literally integrated into the fabric of the landscape, in the portrait-like compositional style of Giuseppe Arcimboldo, highly detailed, 4k resolution
"Imagination Ignition" + Julie Mehretu: /imagine prompt: Silhouette figure at a vintage computer console, inputting prompts, explosive burst of AI-generated imagery washing over them, complex layering of geometric shapes, architectural elements, and dynamic gestural marks, in the large-scale abstract style of Julie Mehretu, highly detailed, 4k resolution
Feel free to copy and paste these prompts directly into Midjourney, or use them as starting points and tweak the language and details to better suit your vision. The key is to provide a vivid, specific description that combines the core concept with the evocative elements of the referenced artistic style. Happy generating!
After generating images in Midjourney from each of these prompts1, we decided to use the image we got from prompt option #4, the “Imagination Ignition” concept. Note that this process did involve more steps than our first example, but still took less than 20 minutes total. There are any number of paths down which we could have continued — for example, asking for more prompts for the “Imagination Ignition” concept, which we quickly realized was the best-suited to Midjourney — but we were satisfied with the bottom-left image and decided to end the process there.
Interested in Julie Mehretu, the artist whose style inspired this edition’s cover art? Find her on Instagram here.
AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.
Here are the others:
“Prompts as Portals” — /imagine Gigantic neural network brain with polka dot prompt portals shining beams of light, transforming into infinite mirrored AI-generated images, in the mesmerizing immersive style of Yayoi Kusama, highly detailed, 4k resolution --ar 4:3 --v 6.0
“The Collaborative Canvas” — /imagine Human and robotic hands collaborating on a digital canvas, human typing prompts, robot translating into neo-expressionist brushstrokes, bold graffiti-like text, frenetic energy, vibrant colors, in the style of Jean-Michel Basquiat, highly detailed, 4k resolution --ar 4:3 --v 6.0
“Wordsmiths and Visionaries” — /imagine Surreal landscape with trees made of adjectives, mountains of descriptive clauses, creatures of evocative nouns, illuminated by Midjourney logo in the sky, words literally integrated into the fabric of the landscape, in the portrait-like compositional style of Giuseppe Arcimboldo --ar 4:3 --v 6.0