Frontier AI Factory

Manufacture intelligence at industrial scale

Forge the AI frontier: trillion-parameter models, trillion-token inference, and efficient orchestration of 1K → 100K+ GPUs.

Request project plan

Close-up of server racks filled with network cables and equipment, showing organized wiring inside a data center.

Why Together AI Factory

Industrial-grade AI infrastructure, custom-built for your AI projects.

NVIDIA Blackwell GPUs

NVIDIA's latest accelerated computing platform, like the NVIDIA GB200 NVL72 and NVIDIA HGX B200, tuned peak performance, supporting both training and inference.

Accelerated software stack

The Together Kernel Collection includes custom NVIDIA CUDA® kernels, reducing training times and costs with superior throughput.

Massive scale

Deploy 1000→ 100K+ NVIDIA GPUs across global locations, adapting to evolving workload demands for resilient, enterprise-ready setups.

GPU Clusters, built for production

Choose the right GPUs, deploy with the orchestration stack you prefer, and operate with the observability and security required for production.

- Managed infrastructure
  Pre-configured drivers
  Observability
  Zero setup overhead
  Deploy GPU clusters with integrated observability, managed orchestration, drivers, and networking entirely pre-configured. Run production workloads instantly without manual infrastructure setup.
  Explore the docs
- Orchestration flexibility
  Kubernetes
  Slurm
  Fully managed
  Deploy Kubernetes for open-source extensibility, or run Slurm for precise hardware control and gang scheduling. Both fully managed.
  Explore the docs
- Self-healing infrastructure
  Acceptance testing
  Automated remediation
  Health checks
  Keep workloads running through hardware events using automated remediation and continuous health checks. Every GPU passes rigorous acceptance testing before joining the cluster.
  Explore the docs

Frontier research-powered training performance

The Together Kernel Collection, built by our Chief Scientist Tri Dao (creator of FlashAttention), delivers improved training and inference performance.

TKC
ThunderKittens

AI Training Performance: NVIDIA Hopper to Blackwell, with TKC
TKC vs SOTA Approaches
90% faster training
Training a 70B parameter Llama-architecture model (BF16) with an optimized TorchTitan + Together Kernel Collection (TKC) reached 15,264 tokens/second/GPU on NVIDIA HGX B200, up from 8,080 tokens/second on NVIDIA HGX H100—a 90% jump in training speed.
learn more
FP8 GEMM Performance (M x N x K)
- ThunderKittens B200
- cuBLAS H100
- cuBLASB200
ThunderKittens vs cuBLAS
~2× faster
ThunderKittens’ FP8 kernel for NVIDIA HGX B200 matches NVIDIA cuBLAS GEMM performance while delivering ~2× speedup over H100 FP8 GEMMs, leveraging Blackwell’s Tensor Core–accelerated matrix operations.
learn more

Orchestration flexibility for your AI workloads

Self-serve GPUs with hourly pricing per GPU.

Managed Kubernetes
For training and inference
Kubeadm-based upstream-compliant K8s
Node autoscaling for elastic compute
Managed Grafana for observability
Flexible ingress configuration for inference
HA control plane with managed upgrades
Cert Manager for HTTPS endpoints
Get started
Slurm on Kubernetes
For training workloads
Precise hardware control and gang scheduling
Submit jobs via srun, sbatch
Direct SSH access with Slurm simplicity and K8s-backed resilience
Essential for distributed training synchronization
Get started

Regions and availability zones

USA
2GW+ in the portfolio with 600MW of near-term capacity in US.
Europe
150 MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.
Asia & Middle East
Options available based on the scale of the projects in Asia and the Middle East.

Choose from global regions to meet data residency and compliance requirements—HIPAA for healthcare, GDPR for Europe, or banking regulations.

Infrastructure you can trust at scale.
Production-grade security.

We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.

Learn More

As an NVIDIA Cloud Partner, Together builds and operates clusters on NVIDIA NCP reference architectures for predictable performance and faster time to production. Your data and models remain under your control with strict privacy safeguards and SOC 2–compliant security practices.

NVIDIA preferred partner
AICPA SOC 2 Type II

Customers running inference in production

View All Stories

"Training our omnimodal Character-3 model required infrastructure designed for large-scale AI. The Together Frontier AI Factory delivered the performance we needed to push the boundaries of multimodal video generation. Together AI understands what builders need — and that made all the difference."

Michael Lingelbach

CEO, Hedra

“Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider — they're a true innovation partner, enabling us to push creative boundaries without compromise.”

Victor Perez

Co-Founder, Krea

View All Stories