Sakana AI's TreeQuest Boosts LLM Performance by 30% with Multi-Model Collaboration

Japanese AI lab Sakana AI has unveiled a groundbreaking technique called Multi-LLM AB-MCTS, which enables multiple large language models (LLMs) to collaborate on complex tasks, outperforming individual models by 30%. This method, detailed in their research paper, leverages Monte Carlo Tree Search (MCTS) to dynamically allocate tasks to the most suitable LLM, optimizing performance.

Key Innovations

Adaptive Branching Search: The algorithm balances "searching deeper" (refining existing solutions) and "searching wider" (generating new solutions), ensuring optimal problem-solving strategies.
Multi-Model Collaboration: The system intelligently assigns tasks to models like OpenAI's o4-mini, Gemini 2.5 Pro, and DeepSeek-R1, leveraging their unique strengths.
Open-Source Framework: Sakana AI has released TreeQuest under an Apache 2.0 license, enabling developers to implement this technique for commercial use.

Performance Highlights

ARC-AGI-2 Benchmark: The multi-model system solved 30% of the 120 test problems, a significant improvement over individual models.
Error Correction: In one instance, a flawed solution from o4-mini was corrected by Gemini 2.5 Pro and DeepSeek-R1, demonstrating the system's ability to combine models for superior results.

AB-MCTS vs individual models

Real-World Applications

Complex Coding: AB-MCTS has been successfully applied to algorithmic coding tasks.
Latency Optimization: The technique can improve response times for web services.
Hallucination Mitigation: By combining models with varying hallucination tendencies, the system achieves better accuracy.

"This approach unlocks the potential of LLMs as a collective intelligence," said Takuya Akiba, a research scientist at Sakana AI. The release of TreeQuest marks a significant step toward more robust and reliable AI applications for enterprises.

Sakana AI's TreeQuest Boosts LLM Performance by 30% with Multi-Model Collaboration

Key Innovations

Performance Highlights

Real-World Applications

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Alex Thompson

Expertise

Key Innovations

Performance Highlights

Real-World Applications

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Alex Thompson

Expertise

Agent Newsletter

Get Agentic Newsletter Today