Becoming an Expert LLM User is not "Prompt Engineering"
"Prompt Engineering" has diminishing value but there's much more to Gen AI use expertise
“You must become an expert at using Gen AI tools well,” I regularly tell people, and they respond with, “Yes, Prompt Engineering,” and they nod knowingly.
Not really.
Yes, knowing how to construct your prompt to get the best out of your favourite LLM is important. But that is a small (and decreasing) part of being good at using LLMs. There is much more to it.
Here’s a very short list. I’ll hopefully elaborate on each of this points in future posts.
Know Which Model to Use
Depending on what you’re trying to do, you need to know the best model to use: ChatGPT from OpenAI has so many to pick from: 4o, o1, o3-mini, o3-mini-high, 4.5. And then there’s Claude from Anthropic (3.5 vs 3.7, Sonnet vs Haiku), and Gemini from Google has 2.0 Flash, 2.0 Pro, 2.0 Flash-Lite, 1.5 Pro, 2.0 Flash Thinking, Grok from Twitter/xAI, and DeepSeek. And each of them has multiple extra capabilities, like “Deep Research”, “Web Search”, “Code Generation and Execution”, “Thinking”, not to mention image interpretation and generation, and analysis of PDFs, excels and more. And I’m surely missing a bunch of other important ones (and I haven’t listed o3 and Operator which are only available with a $200 per month subscription). Most people don’t even know about the existence of all these. And that isn’t enough, you need to know the strengths and weaknesses of each of these models and modes so you can decide which task to give to which model in which mode.
Be an Effective Manager of Your Team of LLMs
Every one of us now has a team of a dozen assistants—the models and modes I’ve listed above. And we need to learn to manage this team effectively: just as in real life, managing this team is a pain. Some of the models can get lazy unless you scold them (and some can even be bribed). All of them are capable of making mistakes, but in different areas, so you have to know which task to give whom, and how to check their work (even if you aren’t an expert in that area, and maybe you can get one of them to check the other’s work). Some are good at coding, some are good at strategy, some are good at dumb and repetitive tasks, some cost a lot less so you can overuse them while others are expensive consultants so you have to use them sparingly. Some of them need detailed and specific instructions, while others can take a high level goal and run with it. Many get tired after a long session and you have to give them a break and re-start later. Many of them forget their own capabilities and you have to remind them that they can do it. Seems hard to believe, but trust me, I’ve experienced each one of these.
Use Third-Party Tools
I hate having to create Google Forms (for example, to collect feedback on a course I taught). I always thought that the only way to do it was manually. Gemini doesn’t appear to have an option to create a form. ChatGPT and others can give me the questions to be asked but that is not the bottleneck: actually creating the forms was the painful part. And then I found “Builder Plus for GPT” a third-party plugin in the ChatGPT ecosystem which took a high level description from me, created the questions using ChatGPT and then created a form in my Google Drive. This saved me a lot of time and grunt work: but most people wouldn’t know that this is possible. Guess how I found out about Builder Plus for GPT1
Know When to Outsource
Much more is possible if you can write a program that uses LLMs via APIs. Most of you don’t need to be a programmer yourself, but it would be very helpful for you to have a good idea of what else is possible to do easily using APIs—then you can outsource it to someone else to build for you.
Quality Control is the New Moat
Because of the well-known problem of hallucinations, you have to become much better at checking the outputs of your team of LLMs. But, LLMs are very different from humans regarding the kind of mistakes they make and the confidence with which they make mistakes. This becomes especially tricky because of the problem called the jagged frontier: LLMs which are extremely smart in one area can be very dumb in an adjacent area—and we are not used to this because humans don’t fail like this. Also, you can have a process and a prompt which works fine 10 days in a row and fails on the 11th without any warning. And unlike other software packages, LLMs can produce different outputs for the same inputs so it is much harder to do testing and quality assurance on the outputs. So, QA of processes involving LLMs is a brand new area of expertise called “Evals”.
There is more. But this should give you an idea of the different new skills and knowledge we all have to quickly acquire if we have to successfully surf the Gen AI wave that threatens to drown a lot of existing human activity.
I asked Grok.
Great post! One of my favorite examples of an underrated team member is using "Gemini 2.0 Flash Thinking Experimental with Apps". Yes, that really is the name, and no, I have no idea why they're all so bad at this.
But anyway, I am increasingly using it more and more to find out new restaurants that have opened in my neighborhood, to plan our annual holidays, and to find clean restrooms on long drives in India. Very underrated, and very good! Your team is with you not just at work, but also at home :)
This is a great write-up.