High-performance LLM inference
NVIDIA H200 GPU clusters optimized for advanced AI inference, generative models, and HPC applications.






















Why NVIDIA H200 on Together GPU Clusters?
Leading AI and HPC capabilities. Rapid deployment. Maximum efficiency.
2x performance over H100
Each H200 GPU cluster offers double the inference throughput compared to H100, ideal for deploying LLMs at unprecedented scale.
Enhanced memory bandwidth
With 141GB HBM3e GPU memory and 4.8TB/s bandwidth, H200 significantly accelerates memory-intensive generative AI workloads and HPC applications.
Maximum efficiency and TCO savings
Achieve higher performance within the same power profile as previous-gen GPUs, drastically reducing energy consumption and total cost of ownership.
Seamless multi-GPU scaling
NVIDIA’s NVLink provides 900GB/s bidirectional bandwidth per GPU, enabling effortless scalability across multi-node deployments for demanding workloads.
Fast, reliable deployment
Together AI delivers NVIDIA H200 clusters quickly, without supply-chain delays, enabling your team to begin large-scale model deployment and HPC projects immediately.
Run by researchers who train models
Our research team actively runs and tunes training workloads on NVIDIA H200 systems. You're not just getting hardware — you’re working with experts at the edge of what's possible.

"What truly elevates Together AI to ClusterMax™ Gold is their exceptional support and technical expertise. Together AI’s team, led by Tri Dao — the inventor of FlashAttention — and their Together Kernel Collection (TKC), significantly boost customer performance. We don’t believe the value created by Together AI can be replicated elsewhere without cloning Tri Dao.”
— Dylan Patel, Chief Analyst, SemiAnalysis
Revolutionizing generative AI and HPC performance
NVIDIA H200, powered by Hopper architecture and enhanced HBM3e memory, is engineered for unparalleled performance in AI inference and HPC computing.

Accelerated Compute
Advanced inference capabilities: H200 delivers exceptional throughput for large-scale LLM inference, handling models like Llama with superior efficiency.
Ultra-high memory performance: The industry’s first GPU offering 141GB HBM3e memory at 4.8TB/s bandwidth, maximizing performance for memory-intensive tasks.
Optimized energy efficiency: Enhanced energy profiles maintain high performance while significantly lowering operational costs and environmental impact.
inference performance RELATIVE TO H100
MILC PERFORMANCE
MEMORY BANDWIDTH RELATIVE TO H100
Technical Specs
NVIDIA H200
AI Data Centers and Power across North America
Data Center Portfolio
2GW+ in the Portfolio with 600MW of near-term Capacity.

Expansion Capability in Europe and Beyond
Data Center Portfolio
150MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.

Next Frontiers – Asia and the Middle East
Data Center Portfolio
Options available based on the scale of the projects in Asia and the Middle East.

Powering AI Pioneers
Leading AI companies are ramping up with NVIDIA Blackwell running on Together AI.
Zoom partnered with Together AI to leverage our research and deliver accelerated performance when training the models powering various Zoom AI Companion features.
With Together GPU Clusters accelerated by NVIDIA HGX B200 Zoom, experienced a 1.9X improvement in training speeds out-of-the-box over previous generation NVIDIA Hopper GPUs.
Salesforce leverages Together AI for the entire AI journey: from training, to fine-tuning to inference of their models to deliver Agentforce.
Training a Mistral-24B model, Salesforce saw a 2x improvement in training speeds upgrading from NVIDIA HGX H200 to HGX B200. This is accelerating how Salesforce trains models and integrates research results into Agentforce.
During initial tests with the NVIDIA HGX B200, InVideo immediately saw a 25% improvement when running a training job from NVIDIA HGX H200.
Then, in partnership with our researchers, the team made further optimizations and more than doubled this improvement – making the step up to the NVIDIA Blackwell platform even more appealing.
Our latest research & content
Learn more about running turbocharged NVIDIA GB200 NVL72 GPU clusters on Together AI.