Debug-gym Can AI agents lighten developers debugging load

April 10, 2025•Brenda Potts•Original Link•2 minutes

Debugging

SoftwareDevelopment

Developers spend a lot of time debugging code. Learn how debug-gym can equip AI agents to help, enabling them to set breakpoints, navigate the codebase, and print runtime variable values on demand, so they better understand the code and its execution flow.

A graphic with a gradient background transitioning from blue on the left to pink on the right. The graphic features a white outline of a computer monitor with code brackets on the screen, an arrow pointing downwards into the monitor, and another arrow curving around to point upwards towards a magnifying glass with a bug icon inside it.

The Growing Role of AI in Coding

With the rise of AI coding tools like GitHub Copilot, developers are increasingly relying on AI to generate code. GitHub CEO Thomas Dohmke predicted that "sooner than later, 80% of the code is going to be written by Copilot." This trend is evident in startups, where 95% of code for a quarter of Y Combinator’s latest batch was written by large language models (LLMs).

However, most developers spend the majority of their time debugging code, not writing it. This raises an important question: Can AI tools also assist in debugging?

Introducing Debug-gym

Microsoft Research has developed debug-gym, an environment that equips AI agents with interactive debugging tools. Unlike traditional AI coding tools, debug-gym allows agents to:

Set breakpoints
Navigate codebases
Print runtime variable values
Create test functions

Figure 1: Diagram demonstrating the code-repairing process in outline. Left: conventional code-repairing system; right: additional tools enabled by debug-gym.

Key Features of Debug-gym

Repository-level information: Agents can navigate and edit files across the entire repository.
Robust and safe: Code runs in sandboxed Docker containers to prevent harmful actions.
Extensible: Easily add new tools to the environment.
Text-based: Compatible with modern LLM-based agents.

Early Results and Future Work

Initial experiments show promising results. While current AI tools struggle with complex debugging tasks, debug-gym-enabled agents show significant improvement. For example, on the SWE-bench Lite benchmark, agents with debugging tools performed better than those without.

Figure 2: The success rate represents the percentage of the 300 SWE-bench Lite issues resolved, comparing between agents with and without debugging tools.

Debug-gym Can AI agents lighten developers debugging load

The Growing Role of AI in Coding

Introducing Debug-gym

Key Features of Debug-gym

Early Results and Future Work

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

David Chen

Expertise

Next Steps

Conclusion

The Growing Role of AI in Coding

Introducing Debug-gym

Key Features of Debug-gym

Early Results and Future Work

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

David Chen

Expertise

Agent Newsletter

Get Agentic Newsletter Today

Next Steps

Conclusion