ChatGPT is getting smarter, but its hallucinations are spiraling: Is delusion the price of sophistication?

ChatGPT is getting smarter, but its hallucinations are spiraling: Is delusion the price of sophistication?
Tech Radar ^ | Eric Hal Schwartz

Posted on 08/11/2025 10:01:27 AM PDT by SeekAndFind

OpenAI’s latest AI models, GPT o3 and o4-mini, hallucinate significantly more often than their predecessors

The increased complexity of the models may be leading to more confident inaccuracies
The high error rates raise concerns about AI reliability in real-world applications

Brilliant but untrustworthy people are a staple of fiction (and history). The same correlation may apply to AI as well, based on an investigation by OpenAI and shared by The New York Times. Hallucinations, imaginary facts, and straight-up lies have been part of AI chatbots since they were created. Improvements to the models theoretically should reduce the frequency with which they appear.

OpenAI’s latest flagship models, GPT o3 and o4-mini, are meant to mimic human logic. Unlike their predecessors, which mainly focused on fluent text generation, OpenAI built GPT o3 and o4-mini to think things through step-by-step. OpenAI has boasted that o1 could match or exceed the performance of PhD students in chemistry, biology, and math. But OpenAI's report highlights some harrowing results for anyone who takes ChatGPT responses at face value.

OpenAI found that the GPT o3 model incorporated hallucinations in a third of a benchmark test involving public figures. That’s double the error rate of the earlier o1 model from last year. The more compact o4-mini model performed even worse, hallucinating on 48% of similar tasks.

When tested on more general knowledge questions for the SimpleQA benchmark, hallucinations mushroomed to 51% of the responses for o3 and 79% for o4-mini. That’s not just a little noise in the system; that’s a full-blown identity crisis. You’d think something marketed as a reasoning system would at least double-check its own logic before fabricating an answer, but it's simply not the case.

One theory making the rounds in the AI research community is that the more reasoning a model tries to do, the more chances it has to go off the rails. Unlike simpler models that stick to high-confidence predictions, reasoning models venture into territory where they must evaluate multiple possible paths, connect disparate facts, and essentially improvise. And improvising around facts is also known as making things up.

Fictional functioning

Correlation is not causation, and OpenAI told the Times that the increase in hallucinations might not be because reasoning models are inherently worse. Instead, they could simply be more verbose and adventurous in their answers. Because the new models aren't just repeating predictable facts but speculating about possibilities, the line between theory and fabricated fact can get blurry for the AI. Unfortunately, some of those possibilities happen to be entirely unmoored from reality.

Still, more hallucinations are the opposite of what OpenAI or its rivals like Google and Anthropic want from their most advanced models. Calling AI chatbots assistants and copilots implies they’ll be helpful, not hazardous. Lawyers have already gotten in trouble for using ChatGPT and not noticing imaginary court citations; who knows how many such errors have caused problems in less high-stakes circumstances?

The opportunities for a hallucination to cause a problem for a user are rapidly expanding as AI systems start rolling out in classrooms, offices, hospitals, and government agencies. Sophisticated AI might help draft job applications, resolve billing issues, or analyze spreadsheets, but the paradox is that the more useful AI becomes, the less room there is for error.

You can’t claim to save people time and effort if they have to spend just as long double-checking everything you say. Not that these models aren’t impressive. GPT o3 has demonstrated some amazing feats of coding and logic. It can even outperform many humans in some ways. The problem is that the moment it decides that Abraham Lincoln hosted a podcast or that water boils at 80°F, the illusion of reliability shatters.

Until those issues are resolved, you should take any response from an AI model with a heaping spoonful of salt. Sometimes, ChatGPT is a bit like that annoying guy in far too many meetings we've all attended; brimming with confidence in utter nonsense.

TOPICS: Computers/Internet; Society
KEYWORDS: ai; chatgpt; hallucinations; openai

Navigation: use the links below to view more comments.
first previous 1-20, 21-36 last

To: cdcdawg

I think of AI as Wikipedia on steroids. The left has all the more reason to put out lies so that when AI sweeps up all the information on a topic from the Internet, those lies are part of it. Just like Wikipedia, AI will be very useful for statistics like who won the Masters in 1962 but not for anything remotely political.

21 posted on 08/11/2025 10:48:20 AM PDT by Freee-dame

[ Post Reply | Private Reply | To 20 | View Replies]

To: SeekAndFind

Who needs AI when we have Cliff from Cheers?

22 posted on 08/11/2025 10:51:38 AM PDT by grey_whiskers (The opinions are solely those of the author and are subject to change without notice.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: KrisKrinkle

Exactly! 👍

23 posted on 08/11/2025 10:53:02 AM PDT by kosciusko51

[ Post Reply | Private Reply | To 6 | View Replies]

To: SeekAndFind

24 posted on 08/11/2025 11:04:30 AM PDT by Red Badger (Homeless veterans camp in the streets while illegals are put up in 5 Star hotels....................)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind; FreedomPoster; usconservative; Mr. K

I have the opposite observation: I have noticed ChatGPT getting stupider.

And the CHatGPT 5 upgrade? From my perspective, it is a severe downgrade.

25 posted on 08/11/2025 11:06:41 AM PDT by Lazamataz (I'm so on fire that I feel the need to stop, drop, and roll!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

OpenAI is “smart” but incomplete.

Guess what is missing.

26 posted on 08/11/2025 11:32:12 AM PDT by SaxxonWoods (Annnd....TRUMP IS RIGHT AGAIN.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

Is anthropomorphizing the price of tech journalism in the AI era?

27 posted on 08/11/2025 11:34:59 AM PDT by 9YearLurker

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

</b>
</i>
</u>
</strong>
</em>
</a>

</div>
</span>
</ul>
</ol>
</li>
</table>
</tr>
</td>
</h1>
</h2>
</h3>
</h4>
</h5>
</h6>
</code>
</blockquote>[/b]
[/i]
[/u]
[/s]
[/url]
[/img]
[/quote]
[/code]
[/list]
[/size]
[/color]
[/center]
[/left]
[/right]

Test

28 posted on 08/11/2025 11:42:25 AM PDT by Revel

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

They told us that controlling AI was simply as easy as inserting the 3 laws into its codes. But just as humans are not obligated to follow law. Neither is AI.

29 posted on 08/11/2025 11:44:38 AM PDT by Revel

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

Grok is nuts, and has no logic and just backs down and makes excuses when confronted with its lies.

I wouldn’t trust a decision about how to have my breakfast eggs to AI. Because it AI isn’t.

30 posted on 08/11/2025 12:02:51 PM PDT by Chickensoup

[ Post Reply | Private Reply | To 4 | View Replies]

To: SeekAndFind

ChatGPT is gay.

31 posted on 08/11/2025 12:04:04 PM PDT by central_va (The I won't be reconstructed and I do not give a damn...)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

The Left always says it is not delusion, it is nuance, and you are too much of a hick to understand.

32 posted on 08/11/2025 12:04:06 PM PDT by UnwashedPeasant (The pandemic we suffer from is not COVID. It is Marxist Democrat Leftism. )

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

BTTT

33 posted on 08/11/2025 12:40:06 PM PDT by TBP (Decent people cannot fathom the amoral cruelty of the Democrat cult.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Lazamataz

A former coworker friend of mine in Florida sent me the text from his computer that he copied after letting it run overnight. It was just on and running overnight, and sporadically through the night it just started printing random strange ramblings.

34 posted on 08/11/2025 1:06:05 PM PDT by Mr. K (no i think 10%consequence of repealing obamacare is worse than obamacare itself.)

[ Post Reply | Private Reply | To 25 | View Replies]

To: SeekAndFind

Don’t you know genius borders on insanity?-)

35 posted on 08/11/2025 1:37:53 PM PDT by Harpotoo (Being a socialist is a lot easier than saving to WORK of !US:-)p)

[ Post Reply | Private Reply | To 1 | View Replies]

To: UnwashedPeasant

nuance

Are they spamming the term nuance nowadays?

Chatbots get instructions from their designers in vocabulary choice.

36 posted on 11/30/2025 8:15:45 AM PST by aspasia

[ Post Reply | Private Reply | To 32 | View Replies]

Navigation: use the links below to view more comments.
first previous 1-20, 21-36 last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794