LogoAgentHunter
  • Submit
  • Industries
  • Categories
  • Agency
Logo
LogoAgentHunter

Discover, Compare, and Leverage the Best AI Agents

Featured On

Featured on yo.directory
yo.directory
Featured on yo.directory
Featured on Startup Fame
Startup Fame
Featured on Startup Fame
AIStage
Listed on AIStage
Sprunkid
Featured on Sprunkid
Featured on Twelve Tools
Twelve Tools
Featured on Twelve Tools
Listed on Turbo0
Turbo0
Listed on Turbo0
Featured on Product Hunt
Product Hunt
Featured on Product Hunt
Game Sprunki
Featured on Game Sprunki
AI Toolz Dir
Featured on AI Toolz Dir
Featured on Microlaunch
Microlaunch
Featured on Microlaunch
Featured on Fazier
Fazier
Featured on Fazier
Featured on Techbase Directory
Techbase Directory
Featured on Techbase Directory
backlinkdirs
Featured on Backlink Dirs
Featured on SideProjectors
SideProjectors
Featured on SideProjectors
Submit AI Tools
Featured on Submit AI Tools
AI Hunt
Featured on AI Hunt
Featured on Dang.ai
Dang.ai
Featured on Dang.ai
Featured on AI Finder
AI Finder
Featured on AI Finder
Featured on LaunchIgniter
LaunchIgniter
Featured on LaunchIgniter
Imglab
Featured on Imglab
AI138
Featured on AI138
600.tools
Featured on 600.tools
Featured Tool
Featured on Featured Tool
Dirs.cc
Featured on Dirs.cc
Ant Directory
Featured on Ant Directory
Featured on MagicBox.tools
MagicBox.tools
Featured on MagicBox.tools
Featured on Code.market
Code.market
Featured on Code.market
Featured on LaunchBoard
LaunchBoard
Featured on LaunchBoard
Genify
Featured on Genify
Copyright © 2025 All Rights Reserved.
Product
  • AI Agents Directory
  • AI Agent Glossary
  • Industries
  • Categories
Resources
  • AI Agentic Workflows
  • Blog
  • News
  • Submit
  • Coummunity
  • Ebooks
Company
  • About Us
  • Privacy Policy
  • Terms of Service
  • Sitemap
Friend Links
  • AI Music API
  • ImaginePro AI
  • Dog Names
  • Readdit Analytics
Back to News List

Chinese VC firm launches dynamic AI benchmark Xbench

June 24, 2025•Caiwei Chen•Original Link•2 minutes
AI
Benchmark
VentureCapital

HongShan Capital Group developed Xbench to evaluate AI models for real-world tasks and reasoning, now open-sourcing it for public use with a leaderboard comparing top models.

HongShan Capital Group (HSG), a Chinese venture capital firm, has developed Xbench, a novel AI benchmarking system designed to evaluate models not just on academic performance but also on real-world task execution. The benchmark, initially an internal tool for investment assessments, is now being open-sourced for public use.

Key Features of Xbench

  • Dual Evaluation System:

    • Academic Testing: Similar to traditional benchmarks, it assesses STEM knowledge (e.g., via Xbench-ScienceQA) with questions vetted by professors.
    • Real-World Tasks: Evaluates practical applications like recruitment (e.g., sourcing battery engineers) and marketing (matching advertisers with influencers).
  • Dynamic Updates: Questions are refreshed quarterly, and the dataset is partially public to maintain relevance.

  • Chinese-Language Focus: The Xbench-DeepResearch component tests models’ ability to navigate Chinese web resources, emphasizing factual consistency and source breadth.

Leaderboard Results

Current rankings (as of launch):

  • Overall: ChatGPT-o3 leads, followed by ByteDance’s Doubao, Gemini 2.5 Pro, and Grok.
  • Recruiting: Perplexity Search and Claude 3.5 Sonnet rank second and third.
  • Marketing: Claude, Grok, and Gemini perform strongly.

Expert Endorsement

Zihan Zheng, lead researcher of LiveCodeBench Pro (NYU), praised Xbench’s ambition to quantify hard-to-measure qualities like creativity and collaboration, calling it a "promising start."

Future Plans

HSG plans to expand into finance, legal, accounting, and design categories, though these question sets remain private for now.

"It’s really difficult for benchmarks to include things that are so hard to quantify," Zheng noted, highlighting Xbench’s innovative approach.

Related News

August 18, 2025•Kaydence Shum

Lenovo Wins Frost Sullivan 2025 Asia-Pacific AI Services Leadership Award

Lenovo earns Frost Sullivan's 2025 Asia-Pacific AI Services Customer Value Leadership Recognition for its value-driven innovation and real-world AI impact.

AI
Lenovo
Asia-Pacific
August 18, 2025•Unknown

Baidu Wenku GenFlow 2.0 Revolutionizes AI Agents with Multi-Agent Architecture

Baidu Wenku's GenFlow 2.0 introduces a multi-agent system for parallel task processing, integrating with Cangzhou OS to enhance efficiency and redefine AI workflows.

AI
MultiAgent
Baidu

About the Author

Dr. Lisa Kim

Dr. Lisa Kim

AI Ethics Researcher

Leading expert in AI ethics and responsible AI development with 13 years of research experience. Former member of Microsoft AI Ethics Committee, now provides consulting for multiple international AI governance organizations. Regularly contributes AI ethics articles to top-tier journals like Nature and Science.

Expertise

AI Ethics
Algorithmic Fairness
AI Governance
Responsible AI
Experience
13 years
Publications
95+
Credentials
2
LinkedInResearchGate

Agent Newsletter

Get Agentic Newsletter Today

Subscribe to our newsletter for the latest news and updates