Discussion about this post

User's avatar
Vinit Padhye's avatar

As one data point for image vs text. I was looking at an image of a graph and I wanted ChatGPT to understand the graph and label certain portions. It failed badly at it, it could describe the image but not correctly manipulate it that gave acceptable results. But when I downloaded the underlying data and asked it to write a program to plot modified graph with desired labels It was able to do so. So even for the same task the text route was better than image route.

I wonder why the difference in image vs text capabilities. Is it the that the text training data was richer? did us humans took more effort in produce higher quality text data or is it more to do with the difference in capabilities of LLM vs diffusion model

Expand full comment
Shweta's avatar

"Try to do any image generation related to actual business/work use cases and you’ll find that DallE/Midjourney etc fail badly. " - I had tried creating specific comic strips for my blog using DallE and Midjourney using various prompts, but had failed miserably.

Expand full comment
3 more comments...

No posts