ChatGPT now does great images and Gemini is best
And these are just the updates in the last 24 hours
It took less than one day for one of my posts to be outdated. I implied yesterday that Google’s Gemini offerings are not that good compared to the others. Today, Google has made Gemini 2.5 available which has shot up to the top of the “best available model” leaderboards. But the more important update is that now both, ChatGPT and Google have become much better at image generation.

Image and video generation have been the least favourite capabilities of Gen AI models for me. Anyone who’s played with image generation knows that text is heavily mangled and there’s just no easy way to fix that. In other words, anyone who’s used Gen AI image generation until last week should be flabbergasted by the “infographic” I created above using ChatGPT 4o. (Yes, the same 4o that I was dissing last week. But note: I was using a paid account. As of this morning, the free account is still using the old, bad method of creating images, but I assume that will get fixed soon.)
Now comes the mindblowing part.
Imagine that you’ve used a Gen AI image generator and after a lot of prompting and coaxing you’ve managed to get an image that’s really good. Except for one tiny mistake. For example, the “I mplied” in my infographic above. If you’ve done this, you’ll know what’s coming next: you ask the AI to fix just that little thing, and everything else is great. And the AI generates a completely new image washing away all the hard-work of the previous image.
Well, I told 4o about the error in the spelling and asked it to fix it, and this is what I got.

What has changed?
Until last week, the way image generation worked was that there were two different AI models. An LLM (i.e. a large language model, like, for example, GPT 4o) would understand your request, and then convert it into a prompt for an image generation model (like, for example, Dall-E2). The problem is that the image model is not a language model and does not understand text. Imagine a smart product manager who understands your business requirements talks to you to understand what image you want and then instructs an illiterate designer to make the image. And the instruction has to go over a phone call, so the designer isn’t shown any visuals. And the designer is memoryless, so that after he sends back an image he completely forgets about it and if a new request is sent to him, he starts from scratch. That’s why image generation had all those problems (bad text and every new image is completely different).
The new models are “multimodal” models. Meaning, image generation and language understanding is incorporated into the same neural network. Meaning, in our analogy above, the same person has both the skills, product manager who can understand your business requirements and a designer who can make brilliant images.
One warning: The new image generation is currently available via ChatGPT Plus (the $20 per month paid account). As of this morning, the free account is still using the old, bad method of creating images, but I assume that will get fixed soon.
How do you know if you’re getting the new image generation or the old? The old one now says “Created with Dall-E” at the bottom.

And the other update today is that Google has just released a Gemini 2.5 model and within a day, many people are saying it is the best non-thinking model available.
Consider this chart from prediction market1 Polymarket. The orange line indicated that until yesterday, 90+% of the people felt that xAI (i.e. Grok) was the best model and then overnight it crashed and Google (i.e. Gemini 2.5) took over as the favourite. Gemini 2.5 is not currently available in the free account at gemini.google.com but for those willing to try more complex user interfaces, it is available in aistudio.google.com.
If you don’t know what a prediction market is and why a chart from there is worth paying attention to, you should ask an LLM, shouldn’t you?
I asked it to generate a picture of the solar system with labelled planets -- and the output was underwhelming -- with lots of typos and wrong labels on planets.
The output told me on its own that it has made mistakes ("glitches like JUPRTAI instead of JUPITER) and offered to correct the mistakes. On asking it to go ahead, it generated a new picture without Jupiter and two Saturns.
On asking it to find all the mistakes, it found all of them, but asking it to regenerate with corrections went into the same cycle ...
Is Claude 3 Opus actually 2nd on your list?