Did a Google AI make a new mathematical discovery?
By 2026, LLMs will be co-authors of mathematical papers, says top mathematician
Last week, FunSearch, an AI algorithm from Google made headlines:
(Note: AI programs like ChatGPT, which have been trained using a large amount of text are called Large Language Models or LLMs.)
Was this a really significant event and what does it mean to the future of mathematics in particular and the capabilities of AI in general? (Note: if you’re not interested in mathematics and the details of this discovery, just skip to the last section, “Implications for the Future”, which should be of general interest.)
Quick Summary
Yes, this is a significant achievement, but maybe a little less significant than the headline makes it out to be. The AI did not make any new mathematical discoveries—it did not prove a new theory. But it did was to write computer programs that could find better solutions for some well-known difficult problems. And, these are important and difficult problems for which humans have been trying to find improvements for decades, so an AI producing better solutions is a significant event. In addition, reading these programs generated by the AI gave actual mathematicians ideas for new theorems. So, the mathematicians were able to make actual new mathematical discoveries based on ideas they got from the AI.
Background
There are several problems1 for which it is difficult to find solutions but if someone claims to have found a solution, we can very easily verify whether it is a valid solution.
For example, consider the travelling salesperson problem: if you're given a list of thousands of locations that a salesperson has to visit, and you are told to find the shortest path that covers all the locations, this is a very hard problem. But if someone gives you a specific path, verifying that it is a valid path is easy: just check whether it visits all the locations. And if someone gives you multiple answers, finding out which one is the best of the lot is also easy: just check the total distance travelled by each, and pick the shortest path.
We don't know of any efficient algorithm to find the best answer: we usually use heuristics that give us an OK answer which might be (for example) 1.4 times the best possible solution. Now this problem is important enough for some companies that finding a solution that is 1.35 times the optimal instead of 1.4 times can result in a savings of millions of dollars.
And there are many such problems in the world.
What did the AI actually do?
LLMs are well-suited to help us solve problems like these where coming up with a new/good solution is difficult but verifying it is easy.
Here is a simplified description of what Google did: They first wrote a bunch of programs each of which produced some solution to the problem being tackled. Each of these solutions was a bad solution (obviously, because we don’t know how to get good solutions to these problems, remember?) Then they gave each program to the LLM and asked it to randomly modify the program so that the program produces a different solution to the same problem. Now, if the program generates a better solution2 (if you remember, this is easy to verify) we keep the new program otherwise we throw it away. We repeat this process tens of thousands of times and in the end, if we’re lucky, we end up with a program that produces a better solution than anything we’ve ever seen before. Not the optimum solution (because this is a really hard problem) but better than solutions produced by humans trying for decades.
The LLM here did not really do much mathematical thinking and did not prove any theorems. It just helped in generating random new programs and was part of an evolutionary process that after a long time finds good quality solutions via random mutation followed by survival of the fittest. Keep in mind that this could not have been done without the help of an LLM: being able to randomly modify an existing program so that it gives a different valid solution to the same problem is not something we know how to do easily without an LLM.
They did this with two different well-known problems (the cap-set problem and the bin-packing problem) and it worked well enough in both cases, thus indicating that the technique can be widely applicable.
In addition, mathematicians studying the programs generated by the LLM to solve the cap-set problem found some interesting ideas in that program and were able to use those ideas to prove a new theorem in this space.
Implications for the Future
As far as real mathematical discoveries and theorems are concerned, LLMs are likely to start contributing soon. Terrence Tao, one of the top mathematicians in the world today has this to say:
2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process. When integrated with tools such as formal proof verifiers, internet search, and symbolic math packages, I expect, say, 2026-level AI, when used properly, will be a trustworthy co-author in mathematical research, and in many other fields as well.
Questions for you:
Are you using an LLM in a way that it can “participate actively in the decision-making process”? If the LLM can help Terrence Tao, I’m sure it can help you.
Are you thinking of ways in which an LLM can be integrated with the other tools of your trade so that by 2026, the AI can be your co-worker and pair programmer?
Computer science folks would know these as NP-hard/NP-complete problems
This is oversimplified. They were using a Genetic Algorithm and the LLM was being used in the step where the next generation of programs is generated from the current generation.