Former DeepSeeker and team unveil RAGEN for training more reliable AI agents
RAGEN represents both a technical and conceptual advancement toward autonomous AI agents capable of reasoning and adaptation.
April 23, 2025 — A collaborative team from Northwestern University, Microsoft, Stanford, and the University of Washington, including former DeepSeek researcher Zihan Wang, has developed RAGEN, a new method for training and evaluating AI agents. The system aims to address the brittleness of current AI agents, making them more reliable for real-world applications.
The Challenge: AI Agents Stuck in "Corporate Purgatory"
Despite 2025 being dubbed the "year of AI agents," most implementations remain experimental, according to a VentureBeat poll. RAGEN tackles this by focusing on multi-turn, interactive settings where agents must adapt, remember, and reason under uncertainty.
How RAGEN Works
Built on the StarPO (State-Thinking-Actions-Reward Policy Optimization) framework, RAGEN emphasizes learning through experience rather than memorization. Key features include:
- Rollout stage: LLMs generate complete interaction sequences guided by reasoning.
- Update stage: Models are optimized using normalized cumulative rewards.
The team tested RAGEN using Alibaba’s Qwen models (1.5 and 2.5), chosen for their open weights and instruction-following capabilities.
The "Echo Trap" Problem
Wang highlighted a core issue in a widely shared X thread: RL systems often reward shortcuts, leading to repetitive behaviors and degraded performance. This "Echo Trap" manifests as reward variance cliffs and disappearing reasoning traces.
Testing Environments
RAGEN evaluates agents across three symbolic tasks:
- Bandit: Single-turn, stochastic risk-reward reasoning.
- Sokoban: Multi-turn, deterministic puzzle-solving.
- Frozen Lake: Stochastic, multi-turn adaptive planning.
Stabilizing Training with StarPO-S
To combat training collapse, the team introduced StarPO-S, which incorporates:
- Uncertainty-based rollout filtering.
- KL penalty removal for greater exploration.
- Asymmetric PPO clipping to amplify high-reward trajectories.
Key Insights for Effective Agent Training
- Task diversity improves generalization.
- Interaction granularity enables meaningful planning.
- Rollout freshness aligns training data with current policies.
An interactive demo visualizes agent rollouts, including intermediate reasoning steps.
Open-Source Release
RAGEN and its frameworks are now available on GitHub, though licensing details are pending.
Unanswered Questions
- How transferable is RAGEN beyond symbolic tasks?
- Can reasoning be sustained over longer horizons?
- What are the implications for enterprise adoption?
RAGEN represents a significant step toward autonomous, reasoning-capable AI agents, though real-world deployment challenges remain.
Related News
Lenovo Wins Frost Sullivan 2025 Asia-Pacific AI Services Leadership Award
Lenovo earns Frost Sullivan's 2025 Asia-Pacific AI Services Customer Value Leadership Recognition for its value-driven innovation and real-world AI impact.
Baidu Wenku GenFlow 2.0 Revolutionizes AI Agents with Multi-Agent Architecture
Baidu Wenku's GenFlow 2.0 introduces a multi-agent system for parallel task processing, integrating with Cangzhou OS to enhance efficiency and redefine AI workflows.
About the Author

Dr. Lisa Kim
AI Ethics Researcher
Leading expert in AI ethics and responsible AI development with 13 years of research experience. Former member of Microsoft AI Ethics Committee, now provides consulting for multiple international AI governance organizations. Regularly contributes AI ethics articles to top-tier journals like Nature and Science.