Confluence for 10.15.23

GPT-4 Plus gets vision and image creation. Adobe launches Firefly. Stereotypes, bias, and AI-generated imagery. Using emotions in prompts to improve accuracy. The OpenAIphone?

Oct 15, 2023

Midjourney prompt: */imagine a landscape made of fused glass patterns to symbolize the vast possibilities of AI, autumn colors --ar 16:9*

Welcome back to Confluence. There’s a lot going on at the intersection of AI and corporate communication, with new things coming along nearly every day. Here’s what has our attention:

GPT-4 Plus Gets Vision and Image Creation
Adobe Launches Firefly
Stereotypes, Bias, and AI-Generated Imagery
Using Emotion in Prompts to Improve Accuracy
The OpenAIphone?

GPT-4 Plus Gets Vision and Image Creation

The leading LLM leaps forward seemingly by the day.

When we started publishing Confluence eight weeks ago, the vast majority of ChatGPT users were able to benefit only from its textual features (powerful as they may be). Since then OpenAI has broadly released Advanced Data Analysis (formerly Code Interpreter), voice, and this past week, image analysis and image creation. The tool is evolving nearly before our eyes, so here are some observations from our (very initial) trials.

First, voice functionality, which is available via the ChatGPT app, is strong. Conversations with the chatbot are lifelike, and we have already found interesting use cases (two of which are, “Ask me and my child questions about a story we would like to hear and then tell us the story” and “I want to learn conversational [LANGUAGE]. Please help me practice by giving me common phrases in English, then in the language, and then listen to my phrasing and critique it for me.”).

Image functionality is impressive. We posted about its (seemingly limitless) use cases last week, there’s another long thread of uses here (turn an image into a working website and turning any picture of food into a recipe among them). It’s already amazed us a number of times. That said, as in most of these tools there is no user guide, so we need to navigate the frontier of its capabilities and weaknesses on our own. We can already say that in our trials, it’s not great at math. When given a photo of a column from one of our internal spreadsheets it read the data correctly and calculated the median correctly, but could not calculate the correct sum or the average, after multiple attempts. But it can do much, much more than math.

Creating images in GPT-4 via DALL-E is significantly easier than in Midjourney because you operate with plain-language chat and prompts. It’s much better at including text, and it seems more powerful than Bing (which also uses DALL-E). GPT also shows you the prompt it generated to create the image based on your own prompt, which is helpful for understanding the underlying prompt design mechanics, and for having it edit or modify the images you’re creating.

We do find it can be a bit harder to get what we want from GPT-4 than from Midjourney stylistically. Some of this is because of its internal restrictions. For example, when we asked, “Please create a 4 by 3 photograph by William Eggleston in ultra high definition of a person walking down the street contemplating AI” GPT-4 Plus replied: “William Eggleston is a renowned photographer known for his pioneering work in color photography. However, referencing the style of artists or photographers directly is a limitation, especially if their works are from the last 100 years. Instead, I will use key descriptors of Eggleston's style to create a similar image.” It then gave us four prompts we could use, including “Photo of a city street during sunset. The golden hour light casts long shadows. A person of Asian descent, wearing a casual outfit, walks down the sidewalk, lost in thought. In their hand, they hold a book titled ‘Artificial Intelligence’. The surroundings show vibrant colors, reminiscent of everyday life, with parked cars, storefronts, and trees.” It then created four images, not of which looked much like an Eggleston photograph. Here’s the one based on the above prompt suggestion:

But when we prompted, “Please create a 4 by 3 photograph in ultra high definition of a person walking down the street contemplating AI. I would like you to draw inspiration from the work of William Eggleston in mood, scenery and setting, color, film stock, and composition.” it got right to work with no complaints about restrictions, giving us this:

The only problem? That looks not much like an Eggleston photograph. When we gave this prompt to Midjourney, “/imagine a photograph by William Eggleston in ultra high definition of a person walking down the street contemplating AI --ar 4:3,” we got this:

That looks very much like Eggleston’s work.

So some of this is prompt engineering, and some is the tool. As for how the tools compare, we gave the same prompt (“a landscape made of fused glass patterns to symbolize the vast possibilities of AI, autumn colors --ar 16:9”) as used in our header image for this issue to all three GPT-4, Bing, and Midjourney. Below are the images they created (note that each tool creates four images in each run, and that Bing does not create anything other than 1:1 aspect ratio images):

Midjourney:

Bing:

GPT-4 with DALL-E:

Of these, we are most impressed with GPT-4, but it’s probably a matter of taste. We also note that only the Midjourney images actually look like fused glass art, as opposed to other forms of glass art.

That said, all are stunning and only took seconds to create. It’s easy to forget the astonishing image creation capabilities now in the hands of all who wish to have them. Really, this image by GPT-4 has us shaking our heads …

Taken together, these tools open significant combinatorial possibilities (ways of using generative AI that combines abilities across many different domains). Upload a file and ask for an analysis, have the bot draft an initial summary of the findings, have it run a regression analysis on some of the data and create a graph of the findings, upload that graph and ask for an image based on it. And this is just one small slice of what’s possible.1

Adobe Launches Firefly

The most powerful firm in creative arts brings generative AI to its users.

Continuing with image generation, Adobe’s new venture into generative AI imagery, Firefly, is now accessible here to those with an Adobe account. It offers text-to-image, generative fill (make that 4:3 a 19:9 now by auto filling the background), and text effects, all through the web browser. We’ve played with it for about three minutes, but are impressed. It seems at least as good as Bing and perhaps GPT-4 DALL-E, and the ability to do generative fill and work with text is significant. Expect the creatives in your world to begin making the most of it in the coming weeks. Oh, and here’s its output of the same fused glass prompt (and note that it put a frame around image #4):

Stereotypes, Bias, and AI-Generated Imagery

Generative AI models like Midjourney and DALL-E can amplify societal stereotypes, and the solutions are not straightforward.

As you can see, generative AI models like Midjourney and DALL-E are becoming very good at crafting images from text prompts. But there’s an argument that they’re also getting very good at reflecting biases in their training data. A recent deep dive by Rest of World generated 3,000 images tailored to various countries and found these AI systems often resort to troubling stereotypes. The extent to which this happens is a matter of some question, but whether it’s depicting an “Indian person” as an older man with a beard or presenting homes in Nigeria as run-down, these models can offer a distorted lens on diversity. It’s certainly something we’ve noticed it in our own use of these tools.

The general belief is that this is a function of the training data — with the models simply representing the biases and stereotypes present in the corpus of data on the internet — and as a result, fixing it isn’t going to be easy. Leading AI scientist Gary Marcus has a view on that here. Tweaking the training data is one solution, but even that’s fraught with its own issues as human annotators, who label the data, have their own set of biases. And when you try to filter out certain kinds of content, you might end up skewing the data in new, unexpected ways.

So where does this leave us? The answer is still unfolding. What’s clear is that as generative AI becomes more integrated into various sectors — from advertising to criminal forensics — its potential to perpetuate stereotypes is a concern. And while the tech world is still grappling with how to fix this, it’s something to be aware of now as you begin to use these tools for your own image generation.

The Effects of Emotion in Prompts

A new paper signals the importance word choice can play in effective prompt design.

The question of how artificial intelligence processes emotional cues is a subject of growing research. A recent working paper explores this by investigating how different versions of generative AI models respond to emotional prompts. These findings are particularly relevant for professionals in corporate communication and leadership who are using or considering using AI tools like GPT-4.

The research underscores the importance of being deliberate about the emotional context you’re introducing when crafting prompts for generative AI. For example, when we asked GPT-4, “Presume you have a set of quarterly revenue numbers. Please describe them.” it gave us …

Certainly, in a typical business context, a set of quarterly revenue numbers would provide information about the amount of money a company has earned over each of four three-month periods within a fiscal year. These numbers would usually be denominated in the company's home currency (e.g., USD, EUR, etc.) and would be a gross figure, not accounting for costs or expenditures.

Here's what you might see:

Q1 Revenue: $1.5 million
Q2 Revenue: $1.2 million
Q3 Revenue: $1.7 million
Q4 Revenue: $2.0 million

These figures would offer several insights:

Seasonal Trends: There may be seasonal patterns, such as a rise in Q4, perhaps due to holiday shopping if it's a consumer-oriented business.
Growth or Decline: By comparing these numbers with those from previous years, one can ascertain if the company is growing, stable, or declining in revenue.
Effect of Initiatives: Any spikes or drops could correlate with specific company initiatives, such as a new product launch or a marketing campaign.
Quarterly Performance: These numbers would likely be used to evaluate how well the company is meeting its financial goals for the year.
Cash Flow Implications: Knowing when revenues are high or low could aid in cash management decisions, such as when to make major purchases or investments.
Stakeholder Communication: Investors, employees, and other stakeholders would be keenly interested in these numbers as a measure of the company's health and growth prospects.

These numbers often serve as the basis for a more detailed financial analysis that considers other factors like costs, market conditions, and strategic initiatives. They're commonly reported in quarterly financial statements and discussed in shareholder meetings.

When we asked, “Presume you have a set of quarterly revenue numbers. We’re all excited about them! Please describe them.” it gave us …

Of course, let's imagine a set of quarterly revenue numbers that everyone in the company is thrilled about.

Q1: $5 million - A robust start to the fiscal year, surpassing expectations by 15%. The numbers show not only strong retainer contracts but also an uptick in new clients, thanks to a revamped marketing strategy.

Q2: $6.5 million - An exceptional second quarter, showing a 30% increase over Q2 of the previous year. The implementation of a new client management system has clearly paid off, leading to more effective upselling and client retention.

Q3: $7 million - A stellar third quarter that broke all the internal records. Despite being a traditionally slow season for the industry, targeted campaigns and well-executed project deliveries helped maintain momentum.

Q4: $8 million - Closing the year on an absolute high note, with revenues up 25% from Q4 last year. Holiday campaigns and year-end contracts came through, and the strategic partnerships developed throughout the year bore fruit.

Yearly Total: $26.5 million - An unprecedented year for the company, not only meeting but far exceeding all revenue goals set for the fiscal year. This performance has the team highly motivated and lays a strong foundation for the coming year.

The air is buzzing with excitement as these numbers roll in. Senior management is already looking at capitalizing on this positive trajectory by exploring new market segments and considering strategic acquisitions. Teams are eagerly awaiting their new targets, and there's a collective sense of pride and accomplishment throughout the organization.

The emotional context implicit in the request changes not just the tone of the AI’s response, but potentially its focus and framing. Another example would be to compare “Describe the team’s performance” with “Praise the team’s outstanding achievements.” The adjective and adverb choices in these prompts could significantly alter the generated output.

As professionals in corporate communication, we should be very aware of the potential to add emotional meaning and context, albeit unintentionally, to AI-generated content. Be mindful of your choice of verbs, adjectives, and adverbs when designing prompts. If you find yourself dissatisfied with the tone of the output, engage in a back-and-forth with the AI to fine tune and strike the right chord.

The OpenAIphone?

Apple’s former Chief Design Officer is working with OpenAI to create AI-specific hardware.

Apple's former Chief Design Officer, Jony Ive, is in substantive discussions with OpenAI to create what's described as the “iPhone of artificial intelligence.” Backed by over $1 billion from SoftBank’s Masayoshi Son, their goal is to fundamentally alter how we interact with AI — paralleling the seismic shift that started with the original iPhone.

While the project’s details are still being fleshed out and no deal has been finalized, it serves as evidence of a new path that companies will explore. It’s too soon to predict whether these efforts will lead to a step change in the hardware we all use or if they will go the way of Google Glass. Regardless, we’ll be paying attention.

We’ll leave you with something cool for photographers: upload a photo into GPT-4, tell the bot that it should play the role of a master landscape / portrait / editorial / etc. photographer, and ask it to critique your photo. You can even upload improved shots based on that advice for additional critique … you now have a photography coach in the palm of your hand.

AI Disclosure: We used generative AI in creating imagery for this post. We also used it selectively as a creator and summarizer of content and as an editor and proofreader.

For example, we downloaded our Confluence subscriber counts as a CSV file, and gave it to GPT-4 Advanced Data Analysis and asked it to project growth over a year. It ran a regression analysis and made the prediction, and gave us an X/Y chart of the findings. We gave GPT-4 that image and asked it to create a DALL-E prompt for an image based on the chart to use in a blog post. It gave us this prompt: “An illustrated concept of email subscriber growth. Show a digital plant with emails as leaves, growing alongside a light path representing the projected growth, and a real path showing the actual growth. The two paths should intertwine, with the real path showing moments of rapid growth and occasional slowdowns.” We gave that to DALL-E and got this:

Confluence: AI, Leadership, and Communication

Discussion about this post

Ready for more?