NVIDIA GB200 NVL72
NVIDIA Blackwell platform has arrived on Together AI.
We’re here to help AI pioneers train trillion-parameter models on turbocharged NVIDIA GB200 NVL72 GPU clusters, powered by our research and expert ops.






















Why NVIDIA GB200 NVL72 on Together GPU Clusters?
The world’s most powerful AI infrastructure. Delivered faster. Tuned smarter.
Train trillion-parameter models on a single, unified NVLink domain
NVIDIA GB200 NVL72 connects 72 Blackwell GPUs and 36 Grace CPUs into one liquid-cooled, memory-coherent rack — enabling tightly synchronized, low-latency training at massive scale.
Custom networking over 130TB/s intra-rack bandwidth
We extend GB200’s 5th-gen NVLink performance with tailored fabric designs — Clos topologies for dense LLMs & oversubscription for MoEs, built on InfiniBand or high-speed Ethernet with full observability.
AI-native shared storage built for checkpoints at scale
Whether you’re running long-horizon training or restarting from a failure, we support AI-native storage systems like VAST and Weka for high-throughput, parallel access to massive datasets and model state.
NVIDIA Grace + Blackwell orchestration, pre-integrated
We validate your Slurm or Kubernetes stack on the actual NVLink and NVSwitch layout to avoid job placement issues and scheduling inefficiencies.
Delivery in 4–6 weeks, no NVIDIA lottery required
We ship full-rack NVIDIA GB200 NVL72 clusters — not just dev kits — with thousands of GPUs available now. You don’t wait on backorders. You start training.
Run by the same people pushing the Blackwell stack forward
Work with engineers who co‑develop Blackwell optimizations; our team continually tunes workloads and publishes cutting‑edge training breakthroughs.

"What truly elevates Together AI to ClusterMax™ Gold is their exceptional support and technical expertise. Together AI’s team, led by Tri Dao — the inventor of FlashAttention — and their Together Kernel Collection (TKC), significantly boost customer performance. We don’t believe the value created by Together AI can be replicated elsewhere without cloning Tri Dao.”
— Dylan Patel, Chief Analyst, SemiAnalysis
Powering reasoning models and AI agents
The NVIDIA GB200 NVL72 is an exascale computer in a single rack:
unlocking training and inference of frontier trillion-parameter models.

SOTA GPU Compute
Rack scale powerhouse: 72 Blackwell GPUs and 36 Grace CPUs are fused by NVLink in a liquid cooled chassis for exceptional bandwidth and thermals.
Unified GPU abstraction: The system appears as one colossal GPU, enabling real-time, trillion parameter LLM training and inference with minimal orchestration overhead.
GB200 Grace Blackwell Superchip building block: Each module joins 2 Blackwell GPUs to 1 Grace CPU via NVLink-C2C, delivering memory-coherent, ultra low latency compute.