Amateur Mathematician Solves 60-Year Erdős Problem With ChatGPT
An amateur mathematician leveraged ChatGPT to solve a longstanding Erdős problem, showcasing how AI tools are enabling breakthroughs in mathematical research beyond academia.
April 26, 2026
TL;DR
An amateur mathematician with no formal research background used ChatGPT to crack a combinatorics problem Paul Erdős posed roughly 60 years ago. The story matters less as a "AI beats experts" headline and more as a concrete data point about what changes when non-specialists get access to a capable reasoning partner. Here is what the cost, mechanism, and history tell you about where this kind of thing goes next.
The subscription fee, time cost, and setup overhead Tomon actually faced
The cost structure here is unusual and worth examining directly. The solver, Istvan Tomon, is described as an amateur in the sense that he was not working within an academic institution on this problem professionally. He did not need a university subscription, a research grant, or a collaborator with domain expertise he lacked. ChatGPT at the Plus tier runs $20 per month. The Erdős combinatorics problem had been open since the 1960s. The ratio of problem age to tool cost is absurd. But the real cost accounting is more nuanced than the subscription fee. The time cost was substantial. Working through a hard combinatorics problem with an AI assistant is not fast. You are iterating on approaches, checking the model's reasoning for errors (and there will be errors), and supplying the domain judgment the model cannot supply on its own. The model does not know which direction is promising. The human does, or learns to, through the iteration. That part is not cheap in hours.60 yrs

How long the Erdős problem had been open before this solution
The mechanism: what the model is actually doing in a session like this
Understanding why this worked requires being specific about what a large language model does during a math problem session. The model has been trained on an enormous corpus of mathematical text: papers, textbooks, competition solutions, MathOverflow threads, proofs. When you describe a combinatorics problem, the model is not reasoning from first principles in the way a mathematician reasons. It is pattern-matching against structural similarities in its training data. It can say, in effect, "problems shaped like this have often been approached with technique X." That is a useful signal, even when the model cannot execute technique X correctly on its own. The specific Erdős problem involved combinatorial geometry, a domain with a large body of competition-adjacent literature. That is exactly the kind of domain where LLM training data is dense. A problem in a more obscure or recently-developed subfield would likely have produced worse suggestions. The second mechanism is what you might call adversarial verification. When Tomon proposed an approach, the model could attempt to poke holes in it. This is not the same as formal proof verification, but it surfaces obvious errors faster than working alone. The model functions as a tireless interlocutor who has read most of the relevant literature. The human supplies the judgment about which objections matter and which are noise. What the model cannot do: sustain a coherent proof strategy across a very long context without drifting, catch its own subtle errors in symbolic manipulation, and know when a line of reasoning is novel versus when it superficially resembles something that does not apply. The human has to cover those gaps. This is not a criticism of the tool; it is a description of the collaboration.The last time this kind of access shift happened in mathematics
The relevant historical comparison is not "AI beats grandmaster at chess." It is the arrival of computational tools in mathematics more broadly. When computer algebra systems like Mathematica arrived in the late 1980s, the initial reaction from professional mathematicians was similar to the current AI discourse: this is a toy, it makes mistakes, real mathematicians do not need it. Within a decade, it had changed how entire subfields operated. Not by replacing the mathematicians, but by making certain classes of calculation cheap enough to attempt speculatively. You could check a conjecture numerically before investing weeks in a formal proof. The activation energy for exploration dropped. The four-color theorem, proved in 1976 with computer assistance, was the flashpoint for that era's version of this argument. Critics argued the proof was not "real" because it relied on exhaustive computer verification of cases no human could check manually. The argument has largely been set aside. The proof is valid. The tool changed what was tractable. The Tomon result sits in a different category than the four-color proof because it is not primarily a computation. It is a conceptual argument assisted by a language model. That is newer territory. The closer analog might be the role of correspondence in 19th-century mathematics, where non-institutional mathematicians like Ramanujan could participate in frontier work by finding a knowledgeable correspondent. The model is a correspondent who is always available and has read everything.The pattern from computational tools applies here
Every time a new tool has lowered the cost of mathematical exploration, the initial use cases have been at the edges of established research: problems that professionals deprioritized, amateurs who previously lacked access, and subfields where the tool's training data happened to be dense. That is exactly the profile of this result.
If you want to run a version of this experiment yourself
The earliest you could act on this productively is now, with a 30-day trial horizon. Here is why that timeline is specific. The workflow Tomon demonstrated is not exotic. It requires a ChatGPT subscription, a well-scoped problem from an existing open-problem list, and enough background in the relevant area to evaluate model outputs critically. If you have a technical background in any quantitative field and there is an open problem in that domain you have thought about casually, the cost of running a structured 30-day experiment is $20 plus your time. The reason to start now rather than waiting for a more capable model: the reasoning you will develop about how to direct an AI collaborator on hard problems is itself valuable and not trivially transferable when you switch models. Learning the failure modes of the current tool is part of the skill. If your interest is more in evaluating AI for research workflows at an organizational level rather than solo exploration, the Perplexity vs ChatGPT comparison is a reasonable starting point for understanding where each tool sits on the retrieval-versus-reasoning spectrum. For problems like the Erdős case, reasoning depth matters more than retrieval freshness. That distinction should drive your tool choice before you invest the setup time.Comments
Some links in this article are affiliate links. Learn more.