AI Does Math as Well as Math Olympians

Until now computers have failed to solve mathematical problems. But the AI program AlphaGeometry has succeeded in finding proofs for dozens of theorems from the International Mathematical Olympiad

By Manon Bischoff

A man sitting at a desk, facing a geometric shape — Kenn Brown/MondoWorks

The International Mathematical Olympiad (IMO) is probably the most prestigious competition for preuniversity students. Every year students from around the world compete for its coveted bronze, silver and gold medals. Soon artificial-intelligence programs could be competing with them, too.

In January a team led by Trieu H. Trinh of Google DeepMind and New York University unveiled a new AI program called AlphaGeometry in the journal Nature. The researchers reported that the program was able to solve 25 out of 30 geometry problems from past IMOs—a success rate similar to that of human gold medalists. The AI also found a more general solution to a problem from the 2004 IMO that had escaped the attention of experts.

Over two days students competing in the IMO must solve six problems from different mathematical domains. Some of the problems are so complicated that even experts cannot solve them. They usually have short, elegant solutions but require a lot of creativity, which makes them particularly interesting to AI researchers.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Translating a mathematical proof into a programming language that computers know is a difficult task. There are formal programming languages specifically developed for geometry, but they make little use of methods from other areas of mathematics—so if a proof requires an intermediate step that involves, say, complex numbers, programming languages specialized for geometry cannot be used.

To solve this problem, Trinh and his colleagues created a data set that doesn’t require the translation of human-generated proofs into a formal language. They first had an algorithm generate a set of geometric “premises,” or starting points: for example, a triangle with some of its measurements drawn in and additional points marked along its sides. The researchers then used a deductive algorithm to infer further properties of the triangle, such as which angles matched and which lines were perpendicular to each other. By combining the premises with the derived properties, the researchers created a training data set consisting of theorems and corresponding proofs. For example, a problem could involve proving a certain characteristic of a triangle—say, that two of its angles were equal. The corresponding solution would then consist of the steps that led the deductive algorithm to it.

To solve problems at the level of an IMO, however, AlphaGeometry needed to go further. “The key missing piece is generating new proof terms,” Trinh and his team wrote in their paper. For example, to prove something about a triangle, you might need to introduce new points and lines that weren’t mentioned in the problem—and that is something large language models (LLMs) are well suited to do.

LLMs generate text by calculating the probability of one word following another. Trinh and his team were able to use their database to train AlphaGeometry on theorems and proofs in a similar way. An LLM does not learn the deductive steps involved in solving a problem; that work is still done by other specialized algorithms.Instead the AI model concentrates on finding points, lines, and other useful auxiliary objects.

When AlphaGeometry is given a problem, a deductive algorithm first derives a list of statements about it. If the statement to be proved is not included in that list, the AI gets involved. It might decide to add a fourth point X to a triangle ABC, for example, so that ABCX represents a parallelogram—something that the program learned to do from previous training. In doing so, the AI gives the deductive algorithm new information to work with. This process can be repeated until the AI and the deductive program reach the desired conclusion. “The method sounds plausible and in some ways similar to the training of participants in the International Mathematical Olympiad,” says Fields Medalist Peter Scholze, who has won the gold medal at the IMO three times.

To test AlphaGeometry, the scientists selected 30 geometric problems that have appeared in the IMO since 2000. The program previously used to solve geometric problems, called Wu’s algorithm, managed to solve only 10 correctly, and GPT-4 failed on all of them, but AlphaGeometry solved 25. According to the researchers, the AI outperformed most IMO participants, who solved an average of 15.2 out of 30 problems. (Gold-medal winners solved an average of 25.9 problems correctly.)

When the researchers looked through the AI-generated proofs, they noticed that in the process of solving one problem, the program hadn’t used all the information provided—meaning that AlphaGeometry set out on its own and found a solution to a related but more general theorem. It was also apparent that complicated tasks—those in which IMO participants performed poorly—generally required longer proofs from the AI. The machine, it seems, struggles with the same challenges as humans.

AlphaGeometry can’t yet take part in the IMO, because geometry is only one third of the competition, but Trinh and his colleagues have emphasized that their approach could be applied to other mathematical subdisciplines, such as combinatorics. Who knows—maybe in a few years a nonhuman participant will take part in the IMO for the first time. Maybe it will even win gold.