Accelerated Compute
Research-optimized GPU compute for AI Natives and enterprises
Train, fine-tune, and deploy on self-service GPU clusters optimized by frontier research — with flexible pricing, production reliability, and security built in.

Why Together Accelerated Compute?
Research-led performance, NVIDIA-validated architecture, and enterprise-grade reliability to train, fine-tune, and serve at any scale.
Faster training & inference
Together Kernel Collection delivers custom CUDA kernels from the FlashAttention team — 90% faster training on NVIDIA Blackwell GPUs with software-led optimization on top of NVIDIA accelerated hardware.
Predictable performance at scale
Go from zero to running workloads in minutes with batteries-included GPU clusters. Virtualized for elasticity, bare-metal where it matters — GPU and network performance without compromise. Pre-configured drivers, built-in observability, and managed orchestration included.
Production reliability & security
Self-healing infrastructure with 99.9% uptime, backed by SOC 2 Type II encryption in transit/at rest, and tenant-level isolation.
Performance-optimized NVIDIA hardware
- NVIDIA GB200 NVL72Memory192GB HBM3e per GPUUse CaseHeavy training and inference at frontier scaleAvailabilityAvailable now through AI FactoryLearn how
- NVIDIA B200Memory192GB HBM3eUse CaseBlackwell architecture
AI reasoning and training workloadsAvailabilityAvailable now through GPU ClustersLearn how - NVIDIA H200Memory141GB HBM3eUse CaseExtended memory
Large model training and inferenceAvailabilityAvailable nowStarting PriceFrom $2.09/GPU-hourLearn how - NVIDIA H100Memory80GB HBM2eUse CaseProven performance
Foundation model workloadsAvailabilityAvailable nowStarting PriceFrom $1.75/GPU-hourLearn how
GPU Clusters, built for production
Choose the right GPUs, deploy with the orchestration stack you prefer, and operate with the observability and security required for production.
Deploy GPU clusters with integrated observability, managed orchestration, drivers, and networking entirely pre-configured. Run production workloads instantly without manual infrastructure setup.
Deploy Kubernetes for open-source extensibility, or run Slurm for precise hardware control and gang scheduling. Both fully managed.
Keep workloads running through hardware events using automated remediation and continuous health checks. Every GPU passes rigorous acceptance testing before joining the cluster.
On-demand for flexibility, reserved for guaranteed capacity and better rates. Both fully self-serve.




Frontier research-powered training performance
The Together Kernel Collection, built by our Chief Scientist Tri Dao (creator of FlashAttention), delivers improved training and inference performance.
AI Training Performance: NVIDIA Hopper to Blackwell, with TKC
TKC vs SOTA Approaches
90% faster training
Training a 70B parameter Llama-architecture model (BF16) with an optimized TorchTitan + Together Kernel Collection (TKC) reached 15,264 tokens/second/GPU on NVIDIA HGX B200, up from 8,080 tokens/second on NVIDIA HGX H100—a 90% jump in training speed.
learn moreFP8 GEMM Performance (M x N x K)
- ThunderKittens B200
- cuBLAS H100
- cuBLASB200
ThunderKittens vs cuBLAS
~2× faster
ThunderKittens’ FP8 kernel for NVIDIA HGX B200 matches NVIDIA cuBLAS GEMM performance while delivering ~2× speedup over H100 FP8 GEMMs, leveraging Blackwell’s Tensor Core–accelerated matrix operations.
learn more
Deployments for any scale
- GPU ClustersGPU clusters at scale
Spin up clusters in minutes with Kubernetes or Slurm. Choose on-demand or reserved capacity. Scale from 8 GPUs to 4,000+.
Explore GPU Clusters - AI FactoryCustom infrastructure at frontier scale
Bespoke infrastructure at factory scale, starting from 1,000+ GPUs. Powered by NVIDIA accelerated compute with Together's research team continuously optimizing performance for your AI workloads.
Contact Sales
Regions and availability zones
Launch close to your users and data across 25+ cities.
- USA2GW+ in the portfolio with 600MW of near-term capacity in US.
- Europe150 MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.
- Asia & Middle EastOptions available based on the scale of the projects in Asia and the Middle East.
Choose from global regions to meet data residency and compliance requirements—HIPAA for healthcare, GDPR for Europe, or banking regulations.
Reference-architecture performance.
Production-grade security.
We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.
NVIDIA preferred partner- AICPA SOC 2 Type II



