ChatGPT makes mistakes, as do other large language models (LLMs) like it; these mistakes are usually referred to as “hallucinations”. Previously, hallucinations often were framed as a bug that would soon be fixed as soon as the newest version of the model was released. But an article recently published by OpenAI1, the company behind ChatGPT, indicates that hallucinations are an inherent mathematical inevitability of the system rather than a problem that can simply be solved with better engineering.2
Depending on how we frame LLMs, this may come as a surprise. We might, for example, imagine an LLM in the way that the term “artificial intelligence” encourages us to: as an entity capable of reasoning. But artificial intelligence cannot reason. Even large reasoning models, designed specifically for this task, still fail to solve complex puzzles like the Tower of Hanoi.3 Alternatively, we might think of LLMs like calculators. If they just had the correct input, we think, surely they would be able to give us the correct output. But research has shown that even when LLMs are connected to databases or search engines, they still make errors. One study connected ChatGPT to Wolfram Alpha and found it still “fail[ed] on some problems that even middling high school students would find easy”.4
Ultimately, we must be careful to remind ourselves that LLMs are not reasoning entities or calculators; they are predictive text engines. As Hicks, Humphries, and Slater put it, “The problem here isn’t that large language models hallucinate, lie, or misrepresent the world in some way. It’s that they are not designed to represent the world at all; instead, they are designed to convey convincing lines of text”.5 The text may be convincing, but we can’t assume that it’s true.
We now know that LLM hallucinations are inevitable, even if their frequency can be reduced in the future. Fact-checking has thus become an essential step in the research and writing process when generative AI is used. For help with fact-checking and citation management, you can always consult a librarian.
(1) Kalai, Adam Tauman, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. “Why Language Models Hallucinate.” arXiv:2509.04664. Preprint, arXiv, September 4, 2025. https://doi.org/10.48550/arXiv.2509.04664.
(2) Swain, Gyana. “OpenAI Admits AI Hallucinations Are Mathematically Inevitable, Not Just Engineering Flaws.” Computerworld, September 18, 2025. https://www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html.
(3) Mauran, Cecily. “‘The Illusion of Thinking’: Apple Research Finds AI Models Collapse and Give up with Hard Puzzles.” Mashable, June 9, 2025. https://mashable.com/article/apple-research-ai-reasoning-models-collapse-logic-puzzles.
(4) Davis, Ernest, and Scott Aaronson. “Testing GPT-4 with Wolfram Alpha and Code Interpreter Plug-Ins on Math and Science Problems.” arXiv:2308.05713. Preprint, arXiv, February 20, 2025. https://doi.org/10.48550/arXiv.2308.05713.
(5) Hicks, Michael Townsen, James Humphries, and Joe Slater. “ChatGPT Is Bullshit.” Ethics and Information Technology 26, no. 2 (2024): 38. https://doi.org/10.1007/s10676-024-09775-5.
0 Comments.