LogoAgentHunter
  • Submit
  • Industries
  • Categories
  • Agency
Logo
LogoAgentHunter

Discover, Compare, and Leverage the Best AI Agents

Featured On

Featured on yo.directory
yo.directory
Featured on yo.directory
Featured on Startup Fame
Startup Fame
Featured on Startup Fame
AIStage
Listed on AIStage
Sprunkid
Featured on Sprunkid
Featured on Twelve Tools
Twelve Tools
Featured on Twelve Tools
Listed on Turbo0
Turbo0
Listed on Turbo0
Featured on Product Hunt
Product Hunt
Featured on Product Hunt
Game Sprunki
Featured on Game Sprunki
AI Toolz Dir
Featured on AI Toolz Dir
Featured on Microlaunch
Microlaunch
Featured on Microlaunch
Featured on Fazier
Fazier
Featured on Fazier
Featured on Techbase Directory
Techbase Directory
Featured on Techbase Directory
backlinkdirs
Featured on Backlink Dirs
Featured on SideProjectors
SideProjectors
Featured on SideProjectors
Submit AI Tools
Featured on Submit AI Tools
AI Hunt
Featured on AI Hunt
Featured on Dang.ai
Dang.ai
Featured on Dang.ai
Featured on AI Finder
AI Finder
Featured on AI Finder
Featured on LaunchIgniter
LaunchIgniter
Featured on LaunchIgniter
Imglab
Featured on Imglab
AI138
Featured on AI138
600.tools
Featured on 600.tools
Featured Tool
Featured on Featured Tool
Dirs.cc
Featured on Dirs.cc
Ant Directory
Featured on Ant Directory
Featured on MagicBox.tools
MagicBox.tools
Featured on MagicBox.tools
Featured on Code.market
Code.market
Featured on Code.market
Featured on LaunchBoard
LaunchBoard
Featured on LaunchBoard
Genify
Featured on Genify
Copyright © 2025 All Rights Reserved.
Product
  • AI Agents Directory
  • AI Agent Glossary
  • Industries
  • Categories
Resources
  • AI Agentic Workflows
  • Blog
  • News
  • Submit
  • Coummunity
  • Ebooks
Company
  • About Us
  • Privacy Policy
  • Terms of Service
  • Sitemap
Friend Links
  • AI Music API
  • ImaginePro AI
  • Dog Names
  • Readdit Analytics
Back to News List

OpenAIs latest reasoning AI models show increased hallucination rates

April 18, 2025•Maxwell Zeff•Original Link•2 minutes
AI
OpenAI
Hallucination

OpenAIs new reasoning AI models demonstrate improved capabilities but exhibit higher hallucination rates compared to previous versions, according to benchmark data.

OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects, particularly in coding and math-related tasks. However, these models also exhibit higher hallucination rates—making up information—compared to OpenAI’s older models, including previous reasoning models like o1 and o3-mini.

Benchmark Results Reveal Higher Hallucination Rates

According to OpenAI’s internal tests, the o3 model hallucinated in response to 33% of questions on PersonQA, its in-house benchmark for measuring factual accuracy about people. This is roughly double the hallucination rate of its predecessors, o1 (16%) and o3-mini (14.8%). The o4-mini performed even worse, hallucinating 48% of the time.

Third-party testing by Transluce, a nonprofit AI research lab, corroborated these findings. Transluce observed instances where o3 fabricated actions, such as claiming to run code on a 2021 MacBook Pro "outside of ChatGPT"—a capability it does not possess.

Why Are Hallucinations Increasing?

OpenAI admits it doesn’t fully understand why hallucinations are worsening with these newer reasoning models. In its technical report, the company notes that "more research is needed" to determine the cause. One hypothesis, proposed by Transluce researcher Neil Chowdhury, suggests that the reinforcement learning techniques used for these models may amplify issues typically mitigated in traditional AI training pipelines.

Practical Implications

While o3 has shown promise in coding workflows—earning praise from Stanford adjunct professor Kian Katanforoosh—its tendency to hallucinate broken website links and other inaccuracies raises concerns. For industries where precision is critical, such as legal or medical fields, these hallucinations could pose significant risks.

Potential Solutions

One potential remedy is integrating web search capabilities. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, another accuracy benchmark. However, this approach requires exposing user prompts to third-party search providers, which may not always be feasible.

The Broader AI Landscape

The AI industry has increasingly shifted focus to reasoning models after traditional scaling methods showed diminishing returns. Reasoning models improve performance without requiring massive computational resources, but the trade-off appears to be higher hallucination rates. OpenAI spokesperson Niko Felix stated that addressing hallucinations remains an "ongoing area of research."

"Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability," Felix said.

As AI models continue to evolve, balancing creativity and accuracy will be a critical challenge for developers and businesses alike.

Related News

August 18, 2025•Kaydence Shum

Lenovo Wins Frost Sullivan 2025 Asia-Pacific AI Services Leadership Award

Lenovo earns Frost Sullivan's 2025 Asia-Pacific AI Services Customer Value Leadership Recognition for its value-driven innovation and real-world AI impact.

AI
Lenovo
Asia-Pacific
August 18, 2025•Unknown

Baidu Wenku GenFlow 2.0 Revolutionizes AI Agents with Multi-Agent Architecture

Baidu Wenku's GenFlow 2.0 introduces a multi-agent system for parallel task processing, integrating with Cangzhou OS to enhance efficiency and redefine AI workflows.

AI
MultiAgent
Baidu

About the Author

Dr. Emily Wang

Dr. Emily Wang

AI Product Strategy Expert

Former Google AI Product Manager with 10 years of experience in AI product development and strategy formulation. Led multiple successful AI products from 0 to 1 development process, now provides product strategy consulting for AI startups while writing AI product analysis articles for various tech media outlets.

Expertise

AI Product Management
User Experience
Business Strategy
Market Analysis
Experience
10 years
Publications
65+
Credentials
2
LinkedInMedium

Agent Newsletter

Get Agentic Newsletter Today

Subscribe to our newsletter for the latest news and updates