Former DeepSeeker and team unveil RAGEN for training more reliable AI agents

April 23, 2025 — A collaborative team from Northwestern University, Microsoft, Stanford, and the University of Washington, including former DeepSeek researcher Zihan Wang, has developed RAGEN, a new method for training and evaluating AI agents. The system aims to address the brittleness of current AI agents, making them more reliable for real-world applications.

Credit: VentureBeat made with Midjourney

The Challenge: AI Agents Stuck in "Corporate Purgatory"

Despite 2025 being dubbed the "year of AI agents," most implementations remain experimental, according to a VentureBeat poll. RAGEN tackles this by focusing on multi-turn, interactive settings where agents must adapt, remember, and reason under uncertainty.

How RAGEN Works

Built on the StarPO (State-Thinking-Actions-Reward Policy Optimization) framework, RAGEN emphasizes learning through experience rather than memorization. Key features include:

Rollout stage: LLMs generate complete interaction sequences guided by reasoning.
Update stage: Models are optimized using normalized cumulative rewards.

The team tested RAGEN using Alibaba’s Qwen models (1.5 and 2.5), chosen for their open weights and instruction-following capabilities.

The "Echo Trap" Problem

Wang highlighted a core issue in a widely shared X thread: RL systems often reward shortcuts, leading to repetitive behaviors and degraded performance. This "Echo Trap" manifests as reward variance cliffs and disappearing reasoning traces.

Testing Environments

RAGEN evaluates agents across three symbolic tasks:

Bandit: Single-turn, stochastic risk-reward reasoning.
Sokoban: Multi-turn, deterministic puzzle-solving.
Frozen Lake: Stochastic, multi-turn adaptive planning.

Stabilizing Training with StarPO-S

To combat training collapse, the team introduced StarPO-S, which incorporates:

Uncertainty-based rollout filtering.
KL penalty removal for greater exploration.
Asymmetric PPO clipping to amplify high-reward trajectories.

Key Insights for Effective Agent Training

Task diversity improves generalization.
Interaction granularity enables meaningful planning.
Rollout freshness aligns training data with current policies.

An interactive demo visualizes agent rollouts, including intermediate reasoning steps.

Open-Source Release

RAGEN and its frameworks are now available on GitHub, though licensing details are pending.

Unanswered Questions

How transferable is RAGEN beyond symbolic tasks?
Can reasoning be sustained over longer horizons?
What are the implications for enterprise adoption?

RAGEN represents a significant step toward autonomous, reasoning-capable AI agents, though real-world deployment challenges remain.

Former DeepSeeker and team unveil RAGEN for training more reliable AI agents

The Challenge: AI Agents Stuck in "Corporate Purgatory"

How RAGEN Works

The "Echo Trap" Problem

Testing Environments

Stabilizing Training with StarPO-S

Key Insights for Effective Agent Training

Open-Source Release

Unanswered Questions

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Dr. Lisa Kim

Expertise

The Challenge: AI Agents Stuck in "Corporate Purgatory"

How RAGEN Works

The "Echo Trap" Problem

Testing Environments

Stabilizing Training with StarPO-S

Key Insights for Effective Agent Training

Open-Source Release

Unanswered Questions

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Dr. Lisa Kim

Expertise

Agent Newsletter

Get Agentic Newsletter Today