Accelerated Compute

Research-optimized GPU compute for AI Natives and enterprises

Train, fine-tune, and deploy on self-service GPU clusters optimized by frontier research — with flexible pricing, production reliability, and security built in.

Explore GPU Clusters

Contact Sales

Why Together Accelerated Compute?

Research-led performance, NVIDIA-validated architecture, and enterprise-grade reliability to train, fine-tune, and serve at any scale.

Faster training & inference

Together Kernel Collection delivers custom CUDA kernels from the FlashAttention team — 90% faster training on NVIDIA Blackwell GPUs with software-led optimization on top of NVIDIA accelerated hardware.

Predictable performance at scale

Go from zero to running workloads in minutes with batteries-included GPU clusters. Virtualized for elasticity, bare-metal where it matters — GPU and network performance without compromise. Pre-configured drivers, built-in observability, and managed orchestration included.

Production reliability & security

Self-healing infrastructure with 99.9% uptime, backed by SOC 2 Type II encryption in transit/at rest, and tenant-level isolation.

Performance-optimized NVIDIA hardware

NVIDIA GB200 NVL72
Memory
192GB HBM3e per GPU
Use Case
Heavy training and inference at frontier scale
Availability
Available now through AI Factory
Learn how
NVIDIA B200
Memory
192GB HBM3e
Use Case
Blackwell architecture
AI reasoning and training workloads
Availability
Available now through GPU Clusters
Learn how
NVIDIA H200
Memory
141GB HBM3e
Use Case
Extended memory
Large model training and inference
Availability
Available now
Starting Price
From $3.99/GPU-hour
Learn how
NVIDIA H100
Memory
80GB HBM2e
Use Case
Proven performance
Foundation model workloads
Availability
Available now
Starting Price
From $3.09/GPU-hour
Learn how

GPU Clusters, built for production

Choose the right GPUs, deploy with the orchestration stack you prefer, and operate with the observability and security required for production.

- Managed infrastructure
  Pre-configured drivers
  Observability
  Zero setup overhead
  Deploy GPU clusters with integrated observability, managed orchestration, drivers, and networking entirely pre-configured. Run production workloads instantly without manual infrastructure setup.
  Explore the docs
- Orchestration flexibility
  Kubernetes
  Slurm
  Fully managed
  Deploy Kubernetes for open-source extensibility, or run Slurm for precise hardware control and gang scheduling. Both fully managed.
  Explore the docs
- Self-healing infrastructure
  Acceptance testing
  Automated remediation
  Health checks
  Keep workloads running through hardware events using automated remediation and continuous health checks. Every GPU passes rigorous acceptance testing before joining the cluster.
  Explore the docs
- Flexible pricing modes
  No commitments
  Guaranteed capacity
  Self-serve reservations
  On-demand for flexibility, reserved for guaranteed capacity and better rates. Both fully self-serve.
  Explore the docs

Frontier research-powered training performance

The Together Kernel Collection, built by our Chief Scientist Tri Dao (creator of FlashAttention), delivers improved training and inference performance.

TKC
ThunderKittens

AI Training Performance: NVIDIA Hopper to Blackwell, with TKC
TKC vs SOTA Approaches
90% faster training
Training a 70B parameter Llama-architecture model (BF16) with an optimized TorchTitan + Together Kernel Collection (TKC) reached 15,264 tokens/second/GPU on NVIDIA HGX B200, up from 8,080 tokens/second on NVIDIA HGX H100—a 90% jump in training speed.
learn more
FP8 GEMM Performance (M x N x K)
- ThunderKittens B200
- cuBLAS H100
- cuBLASB200
ThunderKittens vs cuBLAS
~2× faster
ThunderKittens’ FP8 kernel for NVIDIA HGX B200 matches NVIDIA cuBLAS GEMM performance while delivering ~2× speedup over H100 FP8 GEMMs, leveraging Blackwell’s Tensor Core–accelerated matrix operations.
learn more

Deployments for any scale

GPU Clusters
GPU clusters at scale
Spin up clusters in minutes with Kubernetes or Slurm. Choose on-demand or reserved capacity. Scale from 8 GPUs to 4,000+.
Explore GPU Clusters
AI Factory
Custom infrastructure at frontier scale
Bespoke infrastructure at factory scale, starting from 1,000+ GPUs. Powered by NVIDIA accelerated compute with Together's research team continuously optimizing performance for your AI workloads.
Contact Sales

Regions and availability zones

Launch close to your users and data across 25+ cities.

USA
2GW+ in the portfolio with 600MW of near-term capacity in US.
Europe
150 MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.
Asia & Middle East
Options available based on the scale of the projects in Asia and the Middle East.

Choose from global regions to meet data residency and compliance requirements—HIPAA for healthcare, GDPR for Europe, or banking regulations.

Reference-architecture performance.
Production-grade security.

We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.

Learn More

As an NVIDIA Cloud Partner, Together builds and operates clusters on NVIDIA NCP reference architectures for predictable performance and faster time to production. Your data and models remain under your control with strict privacy safeguards and SOC 2–compliant security practices.

preferred partner
SOC 2 Type II
ISO 27001:2022

Customers running inference in production

View All Stories

"Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators."

Demi Guo

CEO, Pika

“Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider — they're a true innovation partner, enabling us to push creative boundaries without compromise.”

Victor Perez

Co-Founder, Krea

View All Stories