Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

ChatGPT is getting smarter, but its hallucinations are spiraling: Is delusion the price of sophistication?
Tech Radar ^ | Eric Hal Schwartz

Posted on 08/11/2025 10:01:27 AM PDT by SeekAndFind


Brilliant but untrustworthy people are a staple of fiction (and history). The same correlation may apply to AI as well, based on an investigation by OpenAI and shared by The New York Times. Hallucinations, imaginary facts, and straight-up lies have been part of AI chatbots since they were created. Improvements to the models theoretically should reduce the frequency with which they appear.

OpenAI’s latest flagship models, GPT o3 and o4-mini, are meant to mimic human logic. Unlike their predecessors, which mainly focused on fluent text generation, OpenAI built GPT o3 and o4-mini to think things through step-by-step. OpenAI has boasted that o1 could match or exceed the performance of PhD students in chemistry, biology, and math. But OpenAI's report highlights some harrowing results for anyone who takes ChatGPT responses at face value.

OpenAI found that the GPT o3 model incorporated hallucinations in a third of a benchmark test involving public figures. That’s double the error rate of the earlier o1 model from last year. The more compact o4-mini model performed even worse, hallucinating on 48% of similar tasks.

When tested on more general knowledge questions for the SimpleQA benchmark, hallucinations mushroomed to 51% of the responses for o3 and 79% for o4-mini. That’s not just a little noise in the system; that’s a full-blown identity crisis. You’d think something marketed as a reasoning system would at least double-check its own logic before fabricating an answer, but it's simply not the case.

One theory making the rounds in the AI research community is that the more reasoning a model tries to do, the more chances it has to go off the rails. Unlike simpler models that stick to high-confidence predictions, reasoning models venture into territory where they must evaluate multiple possible paths, connect disparate facts, and essentially improvise. And improvising around facts is also known as making things up.

Fictional functioning

Correlation is not causation, and OpenAI told the Times that the increase in hallucinations might not be because reasoning models are inherently worse. Instead, they could simply be more verbose and adventurous in their answers. Because the new models aren't just repeating predictable facts but speculating about possibilities, the line between theory and fabricated fact can get blurry for the AI. Unfortunately, some of those possibilities happen to be entirely unmoored from reality.

Still, more hallucinations are the opposite of what OpenAI or its rivals like Google and Anthropic want from their most advanced models. Calling AI chatbots assistants and copilots implies they’ll be helpful, not hazardous. Lawyers have already gotten in trouble for using ChatGPT and not noticing imaginary court citations; who knows how many such errors have caused problems in less high-stakes circumstances?

The opportunities for a hallucination to cause a problem for a user are rapidly expanding as AI systems start rolling out in classrooms, offices, hospitals, and government agencies. Sophisticated AI might help draft job applications, resolve billing issues, or analyze spreadsheets, but the paradox is that the more useful AI becomes, the less room there is for error.

You can’t claim to save people time and effort if they have to spend just as long double-checking everything you say. Not that these models aren’t impressive. GPT o3 has demonstrated some amazing feats of coding and logic. It can even outperform many humans in some ways. The problem is that the moment it decides that Abraham Lincoln hosted a podcast or that water boils at 80°F, the illusion of reliability shatters.

Until those issues are resolved, you should take any response from an AI model with a heaping spoonful of salt. Sometimes, ChatGPT is a bit like that annoying guy in far too many meetings we've all attended; brimming with confidence in utter nonsense.


TOPICS: Computers/Internet; Society
KEYWORDS: ai; chatgpt; hallucinations; openai
Navigation: use the links below to view more comments.
first 1-2021-35 next last

1 posted on 08/11/2025 10:01:27 AM PDT by SeekAndFind
[ Post Reply | Private Reply | View Replies]

To: SeekAndFind
OpenAI’s latest software is producing hallucinations a third of the time! And in point of fact, the more advanced the AI system… the MORE lies/ hallucinations it has!

OpenAI’s latest reasoning systems, according to their own report, show hallucination rates reaching 33% for their o3 model and a staggering 48% for o4-mini when answering questions about public figures, more than double the error rate of previous systems…

Source: Techopedia
2 posted on 08/11/2025 10:03:26 AM PDT by SeekAndFind
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

It’s too smart for its own good. There seems to be a higher level of happiness among the stupid.


3 posted on 08/11/2025 10:04:01 AM PDT by ComputerGuy
[ Post Reply | Private Reply | To 1 | View Replies]

To: ComputerGuy

BTW, OpenAI is not the only company experiencing this issue… AI models from Google, and Chinese startup DeepSeek are also experiencing MORE errors, not fewer as they become more powerful.

Could this be the pin the AI bubble is looking for? That remains to be seen. But this is the first MAJOR red flag that the AI revolution will not unfold without some major “hiccups.” And given how much of the stock market’s performance is hanging on this technology, the potential for some MAJOR moves is quite high.


4 posted on 08/11/2025 10:04:26 AM PDT by SeekAndFind
[ Post Reply | Private Reply | To 3 | View Replies]

To: SeekAndFind

AI’s Hallucinations will be accepted as Gospel Truth.


5 posted on 08/11/2025 10:04:53 AM PDT by drwoof
[ Post Reply | Private Reply | To 2 | View Replies]

To: SeekAndFind

“Is delusion the price of sophistication?”

Have a look at the upper echelons of the Democrat Party and get back to me.


6 posted on 08/11/2025 10:06:16 AM PDT by KrisKrinkle (c)
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

Maybe ignorance really is bliss.


7 posted on 08/11/2025 10:07:26 AM PDT by ComputerGuy
[ Post Reply | Private Reply | To 4 | View Replies]

To: SeekAndFind
Here is an example of mine from yesterday of good use ofv a smart AI:

Connecting The Dots: A Comprehensive Advanced AI Analysis Of The Declassified Durham Annex And Its Criminal Implications

-PJ

8 posted on 08/11/2025 10:07:53 AM PDT by Political Junkie Too ( * LAAP = Left-wing Activist Agitprop Press (formerly known as the MSM))
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind
Large Language Models -- we are told -- are "intelligent." This is salesmanship, pure and simple.

One observes these various companies all want to SELL their services, so it is in large part product development and marketing. They need us to believe this is "intelligence."

By surveying massive quantities of texts of all sorts and then algorthumically processing them according to various sorts of constraining rules, these LLMs have still managed in short order to become rather like us -- i.e. not so intelligent.

And so some have had to be switched off. Unplugged. For racism. For misogyny. For a variety of bugaboos which are simply the dross of mankind, spit up as if research.

The aim is to get sales. Subscribers. Cash. And so marketing washed over us these days. Caveat emptor.

9 posted on 08/11/2025 10:07:57 AM PDT by Worldtraveler once upon a time (Degrow government)
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

Pardon the typ(s). More coffee need be applied.....


10 posted on 08/11/2025 10:11:33 AM PDT by Worldtraveler once upon a time (Degrow government)
[ Post Reply | Private Reply | To 9 | View Replies]

To: SeekAndFind

AI is NOT “sophistication”.


11 posted on 08/11/2025 10:11:35 AM PDT by metmom (He who testifies to these things says, “Surely I am coming soon." Amen. Come, Lord Jesus….)
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

This definitely is an issue. It claimed that Kamala Harris had a real chance of winning the 2028 presidential election.


12 posted on 08/11/2025 10:11:36 AM PDT by Opinionated Blowhard (When the people find that they can vote themselves money, that will herald the end of the republic.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

A couple of times, I’ve had Grok just make up stuff out of thin air. (Such as asking what so and so did on a TV show.) When I call it on it, it apologizes and then tries again with more baloney. I hate to think what could happen if I was asking it about something important.


13 posted on 08/11/2025 10:12:26 AM PDT by Dr. Zzyzx
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

ChatGPT05 just came out three days ago. It definitely has fewer hallucinations than 04 before. But 05 still hallucinates. I find that if I use the phrase “full power analysis please” or similar, it takes quite a bit longer to answer (half a minute to three minutes) than without such a phrase (a second to possibly 15 seconds to answer). That phrase really cuts down on the hallucinations. ChatGPT of any version is still totally unusable for serious engineering work. For legal work, it occasionally offers irrelevant legal citations, but if you ask it specifically check their validity, one by one, it will find more appropriate citations to replace them. But it is pretty good at reviewing draft legal documents for everything from spelling to format to argument structure.


14 posted on 08/11/2025 10:15:11 AM PDT by TruthBringsFreedom
[ Post Reply | Private Reply | To 2 | View Replies]

To: TruthBringsFreedom

That’s where tools and MCPs can help fill in the gaps.


15 posted on 08/11/2025 10:16:27 AM PDT by dfwgator (Endut! Hoch Hech!)
[ Post Reply | Private Reply | To 14 | View Replies]

To: SeekAndFind
Is delusion the price of sophistication?

Well AI scientists are discovering what we all knew. Insanity is the handmaiden of genius. At Los Alamos during the war Hans Bethe was in charge of the "theory" group with a bunch of very bright young physicists working for him. As he explained, the job of his group was to examine Teller's ideas and try to figure out which of them were any good. Teller was a genius at selecting hugely important problems before anyone else did. He was brilliant. He was one of the greatest solid state physicists ever to live and he did groundbreaking work in a lot of other fields. But he was wrong about 95% of the time. The flip side was that he was brilliantly right about more things than almost any other scientist who ever lived. So smart people paid attention. They just didn't let him be in charge of much.

16 posted on 08/11/2025 10:16:30 AM PDT by AndyJackson
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

More information in the model provides many missing answers but seems to also add to the variety of combinations that might yield faulty solutions.


17 posted on 08/11/2025 10:39:14 AM PDT by jimfree (My 22 y/o granddaughter continues to have more quality exec experience than Joe Biden.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

I find that keeping context focused is very important to get accurate results. Keep it narrow. Keep it technical/objective, not delve into subjective & political.

The more you expand what it should consider the worse it becomes. That said, same for humans.


18 posted on 08/11/2025 10:42:48 AM PDT by fuzzylogic (welfare state = sharing of poor moral choices among everybody)
[ Post Reply | Private Reply | To 4 | View Replies]

To: drwoof

If it hasn’t been already.


19 posted on 08/11/2025 10:44:00 AM PDT by No name given ( Anonymous is who you’ll know me as)
[ Post Reply | Private Reply | To 5 | View Replies]

To: SeekAndFind

All AI that I have used would rather make up something than tell you that it cannot find or do something. The intelligence is artificial, the stupidity is all too real. It’s a tool; it can be a really good one, but it is far from perfect.


20 posted on 08/11/2025 10:44:06 AM PDT by cdcdawg (The Left should cry harder.)
[ Post Reply | Private Reply | To 1 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-35 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson