Amazon's Project Rainier AI Supercluster for Anthropic Unveiled

Amazon Web Services (AWS) is constructing a massive AI supercluster dubbed Project Rainier, designed to power Anthropic's AI models. The system, set to launch later this year, will span multiple sites across the U.S., including a 30-datacenter facility in Indiana consuming over 2.2 gigawatts of power. Unlike other supercomputers like OpenAI's Stargate or xAI's Colossus, Project Rainier relies on Amazon's custom Trainium2 accelerators instead of GPUs.

The Heart of Project Rainier: Trainium2 Accelerators

At the core of Project Rainier are Amazon's Trainium2 chips, unveiled in December 2024. Each accelerator features:

1.3 petaFLOPS of dense FP8 performance
96GB of HBM and 2.9TB/s of memory bandwidth
Support for 4x sparsity, boosting FP8 performance to 5.2 petaFLOPS

While Trainium2 lags behind Nvidia's B200 in raw performance, AWS emphasizes "good put"—training efficiency and uptime—as the key metric for large-scale clusters.

Trn2 Instances and UltraServers

AWS packages Trainium2 into Trn2 instances, each with:

16 accelerators
1536GB of HBM
46.4TB/s of memory bandwidth

Four Trn2 systems can be combined into an UltraServer, creating a 64-chip compute domain with 6.1TB of HBM and 186TB/s of memory bandwidth. The UltraServer uses a 3D torus interconnect topology, eliminating the need for switches and enabling air cooling.

Scaling to Hundreds of Thousands of Chips

Project Rainier aims to connect tens of thousands of UltraServers, potentially reaching hundreds of thousands of Trainium2 accelerators. A cluster of 256,000 chips would require 250-300 megawatts of power, comparable to xAI's Colossus.

Future Upgrades: Trainium3 on the Horizon

AWS has already teased Trainium3, built on TSMC's 3nm process, promising 40% better efficiency and 4x the performance of Trn2 systems. This could mean 1.33 exaFLOPS of sparse FP8 performance per UltraServer.

Why It Matters

Anthropic, backed by Amazon's $8 billion investment, gains a competitive edge in the AI race.
AWS showcases its ability to build custom silicon at scale, challenging Nvidia's dominance.
The project highlights the insatiable demand for AI compute, with power consumption rivaling small cities.

For more details, check out AWS's blog post or the original Trainium2 announcement.

Trainium2 chip Each Trainium2 package features a pair of 5nm compute dies flanked by high-bandwidth memory.

Amazon's Project Rainier AI Supercluster for Anthropic Unveiled

The Heart of Project Rainier: Trainium2 Accelerators

Trn2 Instances and UltraServers

Scaling to Hundreds of Thousands of Chips

Future Upgrades: Trainium3 on the Horizon

Why It Matters

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Alex Thompson

Expertise

The Heart of Project Rainier: Trainium2 Accelerators

Trn2 Instances and UltraServers

Scaling to Hundreds of Thousands of Chips

Future Upgrades: Trainium3 on the Horizon

Why It Matters

Related News

AWS extends Bedrock AgentCore Gateway to unify MCP servers for AI agents

CEOs Must Prioritize AI Investment Amid Rapid Change

About the Author

Alex Thompson

Expertise

Agent Newsletter

Get Agentic Newsletter Today