Chinese VC firm launches dynamic AI benchmark Xbench
HongShan Capital Group developed Xbench to evaluate AI models for real-world tasks and reasoning, now open-sourcing it for public use with a leaderboard comparing top models.
HongShan Capital Group (HSG), a Chinese venture capital firm, has developed Xbench, a novel AI benchmarking system designed to evaluate models not just on academic performance but also on real-world task execution. The benchmark, initially an internal tool for investment assessments, is now being open-sourced for public use.
Key Features of Xbench
-
Dual Evaluation System:
- Academic Testing: Similar to traditional benchmarks, it assesses STEM knowledge (e.g., via Xbench-ScienceQA) with questions vetted by professors.
- Real-World Tasks: Evaluates practical applications like recruitment (e.g., sourcing battery engineers) and marketing (matching advertisers with influencers).
-
Dynamic Updates: Questions are refreshed quarterly, and the dataset is partially public to maintain relevance.
-
Chinese-Language Focus: The Xbench-DeepResearch component tests models’ ability to navigate Chinese web resources, emphasizing factual consistency and source breadth.
Leaderboard Results
Current rankings (as of launch):
- Overall: ChatGPT-o3 leads, followed by ByteDance’s Doubao, Gemini 2.5 Pro, and Grok.
- Recruiting: Perplexity Search and Claude 3.5 Sonnet rank second and third.
- Marketing: Claude, Grok, and Gemini perform strongly.
Expert Endorsement
Zihan Zheng, lead researcher of LiveCodeBench Pro (NYU), praised Xbench’s ambition to quantify hard-to-measure qualities like creativity and collaboration, calling it a "promising start."
Future Plans
HSG plans to expand into finance, legal, accounting, and design categories, though these question sets remain private for now.
"It’s really difficult for benchmarks to include things that are so hard to quantify," Zheng noted, highlighting Xbench’s innovative approach.
Related News
Lenovo Wins Frost Sullivan 2025 Asia-Pacific AI Services Leadership Award
Lenovo earns Frost Sullivan's 2025 Asia-Pacific AI Services Customer Value Leadership Recognition for its value-driven innovation and real-world AI impact.
Baidu Wenku GenFlow 2.0 Revolutionizes AI Agents with Multi-Agent Architecture
Baidu Wenku's GenFlow 2.0 introduces a multi-agent system for parallel task processing, integrating with Cangzhou OS to enhance efficiency and redefine AI workflows.
About the Author

Dr. Lisa Kim
AI Ethics Researcher
Leading expert in AI ethics and responsible AI development with 13 years of research experience. Former member of Microsoft AI Ethics Committee, now provides consulting for multiple international AI governance organizations. Regularly contributes AI ethics articles to top-tier journals like Nature and Science.