People with a tech background can probably skip this post.
LLM, GenAI, GPT, Foundation Model, RLHF, RAG: it is easy to get lost in all the new terms used when talking about modern AI. This post is a quick introduction to the most important ones you should know about. The good news is that hiding behind these formidable acronyms are fairly simple concepts.
GenAI aka Generative AI
When AI is used to create (generate) new things, like an image, or an essay, or an answer to a text question it is called a Generative AI or GenAI.
ChatGPT (for generating text) Dall-E (for generating images) and Sora (for generating videos) are all examples of GenAI. Examples of AI which are not GenAI are Spotify recommendations, self-driving cars, and face recognition.
LLM aka Large Language Model
To understand what a Large Language Model is, we need to understand a model, then a language model, and finally a large language model.
Model: Most modern AI programs work as follows: first they’re given a large amount of training data. Then a “training program” spends a large amount of time analysing the training data and looking for useful patterns in them and trying to make sense of these patterns. All the patterns that are learned by this program are given numbers and this set of numbers (called the weights or the parameters) is stored in the memory of the computer. This set of weights is called a model (because it is trying to model all of the training data using a much smaller number of patterns).
In short, AI programs or AI algorithms are called models.
Language Model: If the training data given to an AI program consists of English text (or text in another human language (called a natural language to distinguish it from a programming language)) then the model that is learned by the AI program is called a language model.
LLM: Thus, a language model is an AI program whose purpose is to be able to understand and generate a human natural language like English. Before ChatGPT, language models used to be (relatively) smaller; for example Google Translate. Since ChatGPT, however, we’ve found that three are major advantages of making the model huge (both, in terms of the size of its training data set, and the number of weights used by the system to learn the patterns in the language). So most new language models are now “large”.
OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and Meta’s Meta.AI are all examples of LLMs.
In fact, “large” language models are humongous: for example, it is estimated that ChatGPT4’s training data would fill up 650km bookshelf if it were printed as books; and the time it took to complete its training was equivalent to 7 million years on modern laptop. And the number of parameters in the model (i.e. the weights in its neurons) would fill an Excel spreadsheet that stretches over 30k football fields. That’s huge.
AI vs ML
AI stands for Artificial Intelligence. ML is Machine Learning. AI vs ML: This distinction used to be important until a few years ago but it is fast becoming irrelevant. You can think of both of them as the same thing now.
GPT vs ChatGPT
GPT stands for “Generative Pre-Trained Transformer” and is the name of the technology first used by OpenAI. Specific AI models built using this technology are called GPT2, GPT3, GPT3.5, GPT4 and so on (obviously, the ones with higher numbers are newer). You normally don’t use these models directly, because that involves writing a program.
ChatGPT is one specific program written using GPT models. The thing you use when you go to chatgpt.com or the ChatGPT mobile app is “ChatGPT” and it uses the GPT 4o-mini model by default. Microsoft’s Bing Copilots also use the same models created by OpenAI. Meta.AI uses models called LlaMa-3.1, LlaMa-3, LlaMa-2. Anthropic’s Claude uses models called Sonnet, Opus, and Haiku (each of which comes in version numbers 2 and 3).
In short, normal users chat with ChatGPT or Meta.AI or Claude, while programmers use GPT-4o or LlaMa-3.1 or Haiku-3.
ChatGPT vs OpenAI
OpenAI is the name of the company and ChatGPT is one of its products. Dall-E and Sora are its other products.
ChatGPT-4 vs ChatGPT-4o vs ChatGPT-4o-mini
Just as we have versions of Windows1 OpenAI also has versions of their GPT technology. GPT2 was released in 2019. GPT3 was released in 2020. GPT3.5 was released in 2022. GPT4 was released in 2023. GPT4 is the best model (in terms of intelligence/reasoning). Even though GPT 4o and GPT 4o-mini were released later, they are, in most cases, less intelligent than GPT4. However, they are much faster (and if you are paying for them, they are much cheaper to use than GPT4.) Of these, as the name suggests, GPT 4o-mini is smaller and faster but less intelligent than GPT 4o.
ChatGPT, the web or app-based chat interface that most people are familiar with, comes in three variations: 1) The free version without creating an account gives you access to GPT 4o-mini. 2) The free version that you can use after creating an account and logging in gives you access to GPT 4o and GPT 4o-mini. You get only limited access to GPT 4o: you can only ask it a certain number of questions per day, after which you have to use 4o-mini. The version that requires a paid subscription ($20 per month (plus GST)) gives you GPT 4, GPT 4o, and GPT 4o-mini. (You might see references to GPT 3.5 in articles: but that is an older version that has been disabled from the web interface.)
Note: GPT 4 is a little better than GPT 4o, but not by much. However, the paid version does allow access to a few other capabilities (like creating customized GPTs (see below)) which are worth the money for advanced users. For most regular users, the free version is good enough.
Mistral, LLaMa/Meta.AI, Grok
In addition to ChatGPT, Gemini, and Claude mentioned above, there are 3 other LLMs that you might hear about. They’re currently not as good as the big three but they are interesting for other reasons.
Meta.AI is the LLM built by Meta/Facebook (the company that owns Instagram). The underlying model is called LlaMa. It is interesting because it is currently the best model that is partially open—all the other top ones are closed (the difference between “open” and “closed” and why it matters is discussed later). And Meta/Facebook is a formidable company so their progress in this space is worth watching.
Mistral AI is a French company which releases the Mistral LLMs. These are interesting because they are the best open-source LLMs currently available for anybody to download and use without any restrictions. Both, the source code, as well as the models are open-sourced. Mistral has recently released some closed-source models, so it is unclear whether they will continue to provide the world with high-quality open-source models.
Grok is a closed-source LLM from xAI, Elon Musk’s AI company. It is integrated with Twitter (available only to paid users). It is not a very good LLM but you can never rule out anything that Musk does. In addition, anything Musk does is always in the news, so you will keep hearing about Grok from time to time.
Prompt Engineering
Technically, when you’re interacting with an LLM, the text you type is called a prompt. And learning how to write prompts to get better answers from the LLM is called prompt engineering.
The most important thing to learn is that you should use long and detailed prompts. Here’s another, more detailed guide to better prompting. Here are OpenAI’s own recommendations on how to get the best out of ChatGPT. These same recommendations work well with all other LLMs. You can also search the internet for “Prompt Engineering” and learn a lot of interesting tips on how to get better at prompting.
You also should remember that all LLMs are lazy and that you should not accept lazy answers from them. Force it to do better. In fact, you can just tell it “do better” after any answer and it will give a better answer. Sometimes, you can repeat this 3 or 4 times and get better answers each time.
Hallucinations
LLMs regularly generate fake output that is completely made up but sounds very plausible. For example, when writing about research it might generate references to non-existent papers. Or generating fake reviews of movies by famous reviewers. Or explaining physics concepts that don’t exist. These are called hallucinations. In general, you can never be sure that an LLM isn’t convincingly lying to you.
Context Window
Every LLM has an upper limit on the number of words it can process/remember at a time. This number is different for different LLM models: for example ChatGPT4 (the paid version) can only handle approximate 25000 words at a time. So if you try to upload a document larger than that, it will refuse to answer.
The technology currently used by all LLMs (called the “transformer architecture”) imposes an upper limit on the number of words it can process and remember at one time. This upper limit has to be chosen by the creators of the model at the time the model was trained. So, different models have different limits. For any given question, if you add up the number of words in the question and the number of words in the answer the total needs to be less than this limit. This limit is called the context window.
(I lied a little bit here. The context window is not specified in the number of words. It is specified in terms of a technical concept called tokens. I’ve explained tokens later in this article, but for now you can think of a token as a small word or parts of long words. Typically, 75 words of normal English text gets converted to about 100 tokens. Or, conversely, a token is 3/4ths of a word on an average.)
Knowledge Cutoff
Can an LLM access the internet and answer questions about the latest events and news? The answer differs depending on whether you’re talking about the model (for example, GPT-4o) or the program (ChatGPT).
First let’s talk about whether a model can answer such questions.
A model, the set of weights, is just a collection of the common patterns found in the training data that was used to create the model. The model itself does not have access to either the training data or the internet. This specific date is called the knowledge cutoff of that model.
So, if the training data used to create the model only contained information until, say April 2024, the model itself will not be able to answer questions about anything that happened in May 2024 or later.
Similarly, if you ask it questions about yourself, it might not be able to answer them (unless you are a famous personality). Why? Because although the training data probably contained some information about you from some social media websites, you are not important enough for the model to remember as one of the common patterns in the training data. By contrast, Shakespeare and Taylor Swift occur so many times in so many different places in its training data that it is able to answer questions about them purely based on the patterns it remembered in its weights.
So for example, if you’re using Claude, it cannot answer questions about Olympic events happening today. And it cannot answer questions about my neighbour (because he’s not a celebrity).
However, ChatGPT the program is a different thing. Suppose you are using ChatGPT the program and you’re logged in, so it is using the GPT-4o model. OpenAI has built ChatGPT the program in such a way that if you ask it questions about a current event or a specific person, the program searches the web (just like a Google search), fetches the first few documents that it thinks are most relevant to the question, and then includes all the data from those documents in the question that it sends to GPT-4o the model, and then asks it to answer the question based on the information that was included. In other words, the actual prompt that gets sent to GPT-4o will be something like “Consider this data <contents of web page 1>, <contents of web page 2>, <contents of web page 3>. Using this data, answer this question <the actual question you asked>.”
See also the section on “RAG” or “Retrieval Augmented Generation” later in this article. Because RAG is what’s happening when you ask questions about things after the knowledge cutoff date.
All models have a knowledge cutoff. But some of the programs that use these models can fetch data from the internet to still answer questions about things after the knowledge cutoff.
At this time this article was written, ChatGPT-4, ChatGPT-4o, Gemini, Bing Copilot, Meta.AI were all able to access the internet in this way. Claude and ChatGPT-4o-mini were not able to access the net.
(Customized) GPTs
OpenAI allows you to create customized versions of ChatGPT, and in a monumentally stupid naming decision, these customized ChatGPT versions are called GPTs. You can give each customized GPT special instructions and you can share the customized GPT with other users (only users with paid accounts). Customized GPTs can even use APIs to call other programs: so for example, you can build a customized GPT to automatically read your Gmail (using APIs made available by Google) and to even automatically send email (again using Gmail APIs)
This customized GPT is different from the GPT3.5, GPT4, GPT4o “models” that I mentioned earlier. Yes, I know it is confusing—complain to Sam Altman. To prevent confusion I will not refer to customized GPTs again in this article.
Model
This is explained under LLM above. Basically, a model is a set of numbers (called weights) which captures the intelligence of a modern AI program. To be able to use the AI program all you need is this set of numbers. The model itself is created by another program by learning patterns in a training data set that was provided to the AI.
API
API stands for Application Programming Interface. An API is a method by which one software program can talk to another software program. For example, to allow you to use ChatGPT-like functionality inside a program, OpenAI has made GPT3.5 and GPT4 APIs available. You have to pay separately to be able to use these APIs in your programs (but the cost is quite low).
For example, if you want to create your own app or website that needed to use ChatGPT-like functionality internally your program would need to use OpenAI’s APIs.
Agents
In the GenAI world, the word “agent” means a program can use GenAI and other tools to program to achieve some goal you’ve given it. For example, you can give an agent the goal to “write a working program that passes these tests”. What the agent needs to do is the following:
Use an LLM to read and understand the tests
Ask the LLM to write a program that would pass the tests
Use a compiler to compile the program
Use a test runner tool to run the program against the given tests
Look at which tests failed and send this information to the LLM and ask it to modify the program appropriately
Repeat steps 2 to 5 until all the tests pass
So far, you’ve probably not heard the word “agent” in connection with GenAI. But you will soon start hearing a lot more of it. For example, check out Devin, the software program writing agent.
But also, “energy projects” is a very fuzzy topic and doing a keyword search on “energy” might not get you all the relevant documents: you do need a LLM-like search to shortlist the relevant documents and then an LLM to read those documents and estimate the percentage of green energy being deployed in those. The way this is solved is to implement something called a “vector database”
Tokens
AI programs don’t actually understand English. They only understand numbers. So, what we would like to do is to assign a unique number to each English work. Then, to make an AI program work, we first convert all the input text into numbers, do all the complex AI analysis on the numbers, output the answer as numbers, and then convert those numbers back into English text. These numbers are called tokens.
There is one problem though: the potential number of “words” in English is infinite: when you take into account misspellings, the various prefixes and suffixes (like -ed, -ing, -ly, -ingly, -ical, -ically, -ingically and so on), and proper nouns, and names of chemicals, etc.
So we use a trick: we give unique tokens to short English words and common groups of letters, and for anything more complex and longer we compose them using the existing tokens. Thus small and common words like “the”, “and”, and “word” are each one single token. However, a long word like “individually” is broken up like this: “ind” “ivid”, and “ually”. Uncommon words and proper nouns are broken down even further. “Navin Kabra” becomes “nav”, “in”, “k”, “abra”.
How exactly an LLM breaks up words into tokens is not important because the conversion happens automatically. You just need to know about the concept of tokens because the context window of LLMs is usually specified in terms of number of tokens, and also if you start reading up on LLMs on the web, you’ll encounter the term “tokens”.
RAG aka Retrieval Augmented Generation
This is when an LLM is used to answer questions related to a collection of documents that were not in the LLM’s training set.
Suppose you have a large number of documents related to customer projects. And you want to answer the question, “Summarize the use of green energy being deployed in the 3 most recent energy projects.” Since these are your private documents, ChatGPT knows nothing about them so it can’t directly answer this question. So what you need to do is to use a sophisticated keyword search to find the 3 documents related to the most recent energy projects and then send these documents to an LLM like ChatGPT for summarization. So your generation (of the summary) is being augmented by uploading documents that were retrieved using a search engine.
That’s RAG.
NotebookLM from Google is a great program that you can use for doing RAG on your own documents.
System Prompt
Whenever you ask an LLM a question, a standard set of instructions for the LLM are automatically added before your question. These instructions are carefully crafted to ensure that the LLM’s response to you is helpful, honest, and harmless. Specifically, it includes instructions on how to handle sensitive topics, how to not be biased, and what kind of help is expected from it. Here is an example of a system prompt.
Unnecessary Details
More Details on the Difference Between AI and GenAI
Until a few years ago, most AI programs were used to detect or classify things. Face recognition (whose face is this?), Netflix recommendation (will Navin like this movie?), and self-driving cars (is there an obstacle in my path that requires me to change my direction or speed?) are all examples of detection/classification. The rise of programs like ChatGPT and Dall-E has given prominence to a new class of AI programs: those which are used to create new text or images or video or sound. So GenAI refers to these programs which are used to generate new content that didn’t exist earlier. Examples of GenAI: ChatGPT, Bard/Gemini, Dall-E, Midjourney, Sora.
The Bitter Lesson Regarding Large Language Models
As described above a language model is an AI program whose purpose is to be able to understand the grammar and meaning of English sentences (or any other language) and/or to generate new text that is grammatically correct and says something sensible or to be able to translate it into another language without altering the meaning.
20 years ago, language models were built by creating programs that understood English grammar. They would recognize nouns and verbs and other parts of speech and try to understand meaning by recognizing subjects and objects and clauses. 10 years ago, rude young maths people decided to chuck all that and just give the AI large amounts of text and to analyze it purely statistically, without any attempt at trying to teach it grammar or language. And this approach resulted in success far beyond any of the earlier attempts. And this approach is now the dominant approach to language models in particular and any natural language processing in general.
This is called the bitter lesson: it is/was hard to accept that dumb statistics are better at understanding human language than sophisticated programs using deep understanding of grammar and semantic rules built over decades (especially for the language experts who spent decades of their lives building these programs which are now obsolete).
More Details on the Difference Between AI and ML
AI vs ML: This distinction used to be important until a few years ago but its importance is decreasing, so you can skip this. AI—Artificial Intelligence—can be defined as getting a machine (in reality, a computer program) to display human-like intelligence. ML—Machine Learning—is a subset of AI programs in which the machine/program automatically learns to think/behave like an intelligent human by looking at examples (i.e. training data) of intelligent behaviour. By contrast, Deep Blue, IBM’s chess-playing computer that beat Garry Kasparov had an algorithm that IBM’s programmers had programmed. It hadn’t “learnt” anything by itself. So it wasn’t an example of machine learning. However, in the last few years, almost all new AI uses ML so the term ML is getting less important. You can safely think of AI and ML as synonymous now.
Foundation Model
A foundation model is an AI model that is capable of performing a wide range of general tasks and hence can be adapted for a wide range of diverse use cases. AI tools for specific use cases are usually built on top of such foundation models.
GPT3.5 and GPT4 are two versions of a foundation model for language-based tasks and ChatGPT is one of the applications built on top of it. Microsoft’s various Copilots are other applications built on top of GPT3.5.
Evo is a “DNA foundation model” and it is expected that biologists will now use it for many different tasks related to DNA, RNA, and proteins: protein folding, protein function prediction, predicting effects of mutations in RNA and DNA, etc.
Open vs Closed models
One tricky question related to GenAI models is: can you incorporate them in your own software/business? If yes, do you have to pay for it? And do you need the model owner’s permission for this? Can they legally prohibit you from using that model if they don’t like you or your business?
Why is this important? Imagine if all the roads in a country were owned by one company and it could charge whatever price it wanted to let you use the roads. And it could arbitrarily decide that some companies (which the owner doesn’t like for some reason) could not use the roads at all. Obviously, this would be terrible for the country, right?
Some people feel that Gen AI technology will be as important as roads in the future and as a result, there should be no restrictions on who is allowed to use them. i.e. they feel that this technology should be “open” to anyone. Other people feel that Gen AI technology will be more similar to mobile phone service and it is OK if the technology is owned by private companies who can choose how much to charge for their technology and who can choose whether to disallow some people/companies from using their technology—as long as they compete with each other. These people feel that it is OK if the technology is “closed” (i.e. in the control of private companies).
(Note: I am simplifying a fairly complex topic here. To understand “open” vs “closed” software better, check out this Wikipedia entry.)
Is Gen AI open or closed? The important pieces in a Gen AI model are:
The data used for training the model
The actual program (source code) that created the model by using the training data
The model itself: this is just a large set of numbers (the weights or parameters). To actually use a Gen AI model, you don’t really need to data from #1 or the program from #2. If you get the model (i.e. the file containing all the weights) you can run the model on your own hardware
For each of the above, there are 3 different questions you can ask:
Is it available to anybody for download?
Can you use it for free or do you have to pay for it?
Can the owner of the data/program/model legally prohibit you from using it for whatever reason?
Yes or No answers to these questions give you various different definitions of whether a model is open or closed.
For example, OpenAI was started with the intention of building AI models and open-sourcing the programs and the models. It did this with its initial models but starting with GPT3, OpenAI switched to a closed model, where neither the dataset, nor the source code, nor the model have been released. OpenAI/ChatGPT, Anthropic/Claude, and Google/Gemini are all “closed”. Meta.AI is partially open: It is partially open: the model weights have been released and are available for most commercial uses (but large companies have to ask for permission).
Anything Else?
If there’s any other terminology related to Gen AI / LLMs / ChatGPT that you’re confused about, please contact me or leave a comment below and I’ll try to answer it. And if you’ve found this useful, please share it with your friends
Question: How does Bill Gates count to ten? Answer: 1, 2, 3, 95, 98, NT, 2000, ME, XP, Vista, 7, 8,10
Sutton's bitter lesson is a crucial point. I see a common criticism of AI from people as "it uses lot of data and compute by applying brute force" and "why don't AI people incorporate human knowledge instead of guzzling so much power" or "it doesn't understand english, its just words statistics" mainly coming from people who are probably not aware of this bitter lesson.
(nit: there seems to be a broken sentence or typo in the last line of the second paragraph in the section where you talk about the bitter lesson)
Btw for RAG you can just use bing copoilet as it can access you document as it’s Microsoft and it has access to your Microsoft file, also you mentioned this in an earlier article sir