Amateur Mathematician Solves 60-Year Erdős Problem With ChatGPT

An amateur mathematician leveraged ChatGPT to solve a longstanding Erdős problem, showcasing how AI tools are enabling breakthroughs in mathematical research beyond academia.

TL;DR

An amateur mathematician with no formal research background used ChatGPT to crack a combinatorics problem Paul Erdős posed roughly 60 years ago. The story matters less as a "AI beats experts" headline and more as a concrete data point about what changes when non-specialists get access to a capable reasoning partner. Here is what the cost, mechanism, and history tell you about where this kind of thing goes next.

You are trying to decide whether AI math tools are a novelty or something that should change how you think about research workflows. You have seen the headline, you are skeptical of the framing, but you are also not sure you can dismiss it. The fork is this: either this is a one-off story about a lucky amateur and a cooperative chatbot, or it is an early signal that the barrier between "person who does research" and "person who cannot" has measurably moved. One of those readings has implications for how you spend your time. The other does not.

The subscription fee, time cost, and setup overhead Tomon actually faced

The cost structure here is unusual and worth examining directly. The solver, Istvan Tomon, is described as an amateur in the sense that he was not working within an academic institution on this problem professionally. He did not need a university subscription, a research grant, or a collaborator with domain expertise he lacked. ChatGPT at the Plus tier runs $20 per month. The Erdős combinatorics problem had been open since the 1960s. The ratio of problem age to tool cost is absurd. But the real cost accounting is more nuanced than the subscription fee. The time cost was substantial. Working through a hard combinatorics problem with an AI assistant is not fast. You are iterating on approaches, checking the model's reasoning for errors (and there will be errors), and supplying the domain judgment the model cannot supply on its own. The model does not know which direction is promising. The human does, or learns to, through the iteration. That part is not cheap in hours.

60 yrs

Amateur Mathematician Solves 60-Year Erdős Problem With ChatGPT — Source: Hacker News

How long the Erdős problem had been open before this solution

The setup cost was essentially zero. No API configuration, no fine-tuning, no specialized math environment. Tomon used the standard chat interface. This is the part that does not get enough attention: the accessibility of the tool lowered the activation energy for attempting the problem in the first place. That is a different kind of cost reduction than raw compute. Maintenance cost is also relevant if you are thinking about replicating this workflow. The model will confidently assert incorrect things. Verifying outputs against a problem you only partially understand is slow and error-prone. The human in the loop still needs enough mathematical literacy to catch the model when it is wrong about something subtle. Tomon had that. Someone with no mathematical background at all would not have closed the loop. The switching cost from "traditional research" to "AI-assisted research" for problems like this is low because the traditional approach for an amateur was already "no access." The counterfactual for Tomon was not "hire a research team." It was "do not attempt this problem."

The mechanism: what the model is actually doing in a session like this

Understanding why this worked requires being specific about what a large language model does during a math problem session. The model has been trained on an enormous corpus of mathematical text: papers, textbooks, competition solutions, MathOverflow threads, proofs. When you describe a combinatorics problem, the model is not reasoning from first principles in the way a mathematician reasons. It is pattern-matching against structural similarities in its training data. It can say, in effect, "problems shaped like this have often been approached with technique X." That is a useful signal, even when the model cannot execute technique X correctly on its own. The specific Erdős problem involved combinatorial geometry, a domain with a large body of competition-adjacent literature. That is exactly the kind of domain where LLM training data is dense. A problem in a more obscure or recently-developed subfield would likely have produced worse suggestions. The second mechanism is what you might call adversarial verification. When Tomon proposed an approach, the model could attempt to poke holes in it. This is not the same as formal proof verification, but it surfaces obvious errors faster than working alone. The model functions as a tireless interlocutor who has read most of the relevant literature. The human supplies the judgment about which objections matter and which are noise. What the model cannot do: sustain a coherent proof strategy across a very long context without drifting, catch its own subtle errors in symbolic manipulation, and know when a line of reasoning is novel versus when it superficially resembles something that does not apply. The human has to cover those gaps. This is not a criticism of the tool; it is a description of the collaboration.

The last time this kind of access shift happened in mathematics

The relevant historical comparison is not "AI beats grandmaster at chess." It is the arrival of computational tools in mathematics more broadly. When computer algebra systems like Mathematica arrived in the late 1980s, the initial reaction from professional mathematicians was similar to the current AI discourse: this is a toy, it makes mistakes, real mathematicians do not need it. Within a decade, it had changed how entire subfields operated. Not by replacing the mathematicians, but by making certain classes of calculation cheap enough to attempt speculatively. You could check a conjecture numerically before investing weeks in a formal proof. The activation energy for exploration dropped. The four-color theorem, proved in 1976 with computer assistance, was the flashpoint for that era's version of this argument. Critics argued the proof was not "real" because it relied on exhaustive computer verification of cases no human could check manually. The argument has largely been set aside. The proof is valid. The tool changed what was tractable. The Tomon result sits in a different category than the four-color proof because it is not primarily a computation. It is a conceptual argument assisted by a language model. That is newer territory. The closer analog might be the role of correspondence in 19th-century mathematics, where non-institutional mathematicians like Ramanujan could participate in frontier work by finding a knowledgeable correspondent. The model is a correspondent who is always available and has read everything.

The pattern from computational tools applies here

Every time a new tool has lowered the cost of mathematical exploration, the initial use cases have been at the edges of established research: problems that professionals deprioritized, amateurs who previously lacked access, and subfields where the tool's training data happened to be dense. That is exactly the profile of this result.

What history also shows is that these tools do not flatten expertise. Mathematica made computation cheap, but it did not make mathematical judgment cheap. The people who used it best were still the people who understood which computations were worth running. The same constraint applies here. Tomon did not need a PhD, but he needed enough mathematical depth to direct the collaboration. That bar is lower than the bar for solving the problem solo. It is not zero. For a more detailed look at how ChatGPT compares to other reasoning-capable models for technical work, the Claude vs ChatGPT comparison covers the practical differences in how each model handles symbolic and logical reasoning tasks. There is also a separate question about the broader Erdős conjecture landscape: hundreds of open problems across combinatorics and number theory, many of which have cash prizes attached (Erdős famously offered money for solutions). Some of those problems are probably now more tractable for a sufficiently motivated amateur with good prompting habits and six months of evenings.

If you want to run a version of this experiment yourself

The earliest you could act on this productively is now, with a 30-day trial horizon. Here is why that timeline is specific. The workflow Tomon demonstrated is not exotic. It requires a ChatGPT subscription, a well-scoped problem from an existing open-problem list, and enough background in the relevant area to evaluate model outputs critically. If you have a technical background in any quantitative field and there is an open problem in that domain you have thought about casually, the cost of running a structured 30-day experiment is $20 plus your time. The reason to start now rather than waiting for a more capable model: the reasoning you will develop about how to direct an AI collaborator on hard problems is itself valuable and not trivially transferable when you switch models. Learning the failure modes of the current tool is part of the skill. If your interest is more in evaluating AI for research workflows at an organizational level rather than solo exploration, the Perplexity vs ChatGPT comparison is a reasonable starting point for understanding where each tool sits on the retrieval-versus-reasoning spectrum. For problems like the Erdős case, reasoning depth matters more than retrieval freshness. That distinction should drive your tool choice before you invest the setup time.