Amazon's Project Rainier AI Supercluster for Anthropic Unveiled
Amazon Web Services is building a massive AI supercluster called Project Rainier for Anthropic, featuring hundreds of thousands of Trainium2 accelerators.
Amazon Web Services (AWS) is constructing a massive AI supercluster dubbed Project Rainier, designed to power Anthropic's AI models. The system, set to launch later this year, will span multiple sites across the U.S., including a 30-datacenter facility in Indiana consuming over 2.2 gigawatts of power. Unlike other supercomputers like OpenAI's Stargate or xAI's Colossus, Project Rainier relies on Amazon's custom Trainium2 accelerators instead of GPUs.
The Heart of Project Rainier: Trainium2 Accelerators
At the core of Project Rainier are Amazon's Trainium2 chips, unveiled in December 2024. Each accelerator features:
- 1.3 petaFLOPS of dense FP8 performance
- 96GB of HBM and 2.9TB/s of memory bandwidth
- Support for 4x sparsity, boosting FP8 performance to 5.2 petaFLOPS
While Trainium2 lags behind Nvidia's B200 in raw performance, AWS emphasizes "good put"—training efficiency and uptime—as the key metric for large-scale clusters.
Trn2 Instances and UltraServers
AWS packages Trainium2 into Trn2 instances, each with:
- 16 accelerators
- 1536GB of HBM
- 46.4TB/s of memory bandwidth
Four Trn2 systems can be combined into an UltraServer, creating a 64-chip compute domain with 6.1TB of HBM and 186TB/s of memory bandwidth. The UltraServer uses a 3D torus interconnect topology, eliminating the need for switches and enabling air cooling.
Scaling to Hundreds of Thousands of Chips
Project Rainier aims to connect tens of thousands of UltraServers, potentially reaching hundreds of thousands of Trainium2 accelerators. A cluster of 256,000 chips would require 250-300 megawatts of power, comparable to xAI's Colossus.
Future Upgrades: Trainium3 on the Horizon
AWS has already teased Trainium3, built on TSMC's 3nm process, promising 40% better efficiency and 4x the performance of Trn2 systems. This could mean 1.33 exaFLOPS of sparse FP8 performance per UltraServer.
Why It Matters
- Anthropic, backed by Amazon's $8 billion investment, gains a competitive edge in the AI race.
- AWS showcases its ability to build custom silicon at scale, challenging Nvidia's dominance.
- The project highlights the insatiable demand for AI compute, with power consumption rivaling small cities.
For more details, check out AWS's blog post or the original Trainium2 announcement.
Each Trainium2 package features a pair of 5nm compute dies flanked by high-bandwidth memory.
Related News
MOGOPLUS Secures $1.5M Funding to Expand Agentic AI in Finance
Sydney AI firm MOGOPLUS raises AU$1.5m from New Model Venture Capital to scale agentic AI solutions in lending and open data markets globally.
Cheqd and ASI Alliance partner to verify AI identities with blockchain
Cheqd collaborates with ASI Alliance to provide cryptographic verification for secure AI agent identities using decentralized identifiers and verifiable credentials.