5 Comments
Feb 19·edited Feb 19Liked by Navin Kabra

As one data point for image vs text. I was looking at an image of a graph and I wanted ChatGPT to understand the graph and label certain portions. It failed badly at it, it could describe the image but not correctly manipulate it that gave acceptable results. But when I downloaded the underlying data and asked it to write a program to plot modified graph with desired labels It was able to do so. So even for the same task the text route was better than image route.

I wonder why the difference in image vs text capabilities. Is it the that the text training data was richer? did us humans took more effort in produce higher quality text data or is it more to do with the difference in capabilities of LLM vs diffusion model

Expand full comment
author

I do think that LLMs understanding of text is far far superior to Image Generators' understanding of images. So, yes, as you pointed out, the best "image based" uses of ChatGPT involve converting it to a text task.

Expand full comment

"Try to do any image generation related to actual business/work use cases and you’ll find that DallE/Midjourney etc fail badly. " - I had tried creating specific comic strips for my blog using DallE and Midjourney using various prompts, but had failed miserably.

Expand full comment

I am excited about Sora. I think more people are going to try out their creativity in videos and image generation than we think. I am seeing many of my friends who never generated creative content image or video are doing so. I think people like to see the visual representation of their ideas. And like you said, it’s ok to be slightly inaccurate (the flipping legs in the Sora video). My thought is that people will find it easier to generate such content, and they will.

Expand full comment
author

@Mukul,

1. I think this is a temporary jump as people explore something new and interesting. Something similar happened when Dall-E was new. But overtime most of those people lost interest and today, other than the occasional person using Dall-E/Midjourney to illustrate an article or a linkedin post, I don't see much usage. It is not mainstream. I think Sora will also see a similar trajectory.

2. 50% of employees of companies are using ChatGPT for their work (worldwide, based on a Salesforce survey of 14k people in 14 countries; even though most companies have prohibited its use). In that sense, ChatGPT is here to stay and will have a huge impact. What fraction of employees of companies are using image generation? I bet it is low-single-digit-percentage. The capabilities of image generation (as it stands today) are just not that useful in work. I believe the same is true of video generation (as it stands today).

Expand full comment