ARC-AGI-3 Benchmark Reveals AI Struggles With Novel Problem Solving
The latest ARC-AGI-3 benchmark tests AI systems on unfamiliar tasks, showing humans still outperform models in basic cognitive challenges.
AI researcher François Chollet and his team have released ARC-AGI-3, the newest version of their benchmark designed to evaluate general intelligence in AI systems. Unlike traditional tests, this benchmark focuses on an AI's ability to tackle completely unfamiliar problems without any prior knowledge or hints.
Key Features of ARC-AGI-3
- Interactive mini-games replace static problems, requiring AI agents to discover rules and objectives independently
- Tests only "core knowledge priors" like object permanence and causality, excluding language or cultural references
- Humans solve challenges quickly, while current AI models consistently fail
Current Results
- A developer preview offers three test games
- The leaderboard shows human dominance with only one mysterious AI entry scoring points
- OpenAI researcher Zhiqing Sun claims ChatGPT agent can solve the first game, but verification is pending
Why This Matters
The benchmark's interactive format mirrors human learning through:
- Exploration
- Planning
- Adaptation
"As long as that gap remains, we do not have AGI," states the project team.
What's Next
- HuggingFace sponsors a $10,000 competition for best-performing agents
- Full benchmark (100+ games) expected by early 2026
- More details at arcprize.org
This development highlights the significant gap between human and artificial intelligence when it comes to flexible, novel problem-solving - a crucial hurdle in the pursuit of true AGI.
Related News
Key Takeaways from ICML 2025 AI Research Conference
Insights from the International Conference for Machine Learning highlight AI talent wars, reinforcement learning trends, and founder ambitions in the AI field.
Chinese VC firm launches dynamic AI benchmark Xbench
HongShan Capital Group developed Xbench to evaluate AI models for real-world tasks and reasoning, now open-sourcing it for public use with a leaderboard comparing top models.
About the Author

Dr. Emily Wang
AI Product Strategy Expert
Former Google AI Product Manager with 10 years of experience in AI product development and strategy formulation. Led multiple successful AI products from 0 to 1 development process, now provides product strategy consulting for AI startups while writing AI product analysis articles for various tech media outlets.