Frontier AI Factory

Manufacture intelligence at industrial scale

Forge the AI frontier: trillion-parameter models, trillion-token inference, and efficient orchestration of 1K → 100K+ GPUs.

Close-up of server racks filled with network cables and equipment, showing organized wiring inside a data center.

Why Together AI Factory

Industrial-grade AI infrastructure, custom-built for your AI projects.

NVIDIA Blackwell GPUs

NVIDIA's latest accelerated computing platform, like the NVIDIA GB200 NVL72 and NVIDIA HGX B200, tuned peak performance, supporting both training and inference.

Accelerated software stack

The Together Kernel Collection includes custom NVIDIA CUDA® kernels, reducing training times and costs with superior throughput.

Massive scale

Deploy 1000→ 100K+ NVIDIA GPUs across global locations, adapting to evolving workload demands for resilient, enterprise-ready setups.

GPU Clusters, built for production

Choose the right GPUs, deploy with the orchestration stack you prefer, and operate with the observability and security required for production.

    • Managed infrastructure

      Pre-configured drivers
      Observability
      Zero setup overhead

      Deploy GPU clusters with integrated observability, managed orchestration, drivers, and networking entirely pre-configured. Run production workloads instantly without manual infrastructure setup.

    • Orchestration flexibility

      Kubernetes
      Slurm
      Fully managed

      Deploy Kubernetes for open-source extensibility, or run Slurm for precise hardware control and gang scheduling. Both fully managed.

    • Self-healing infrastructure

      Acceptance testing
      Automated remediation
      Health checks

      Keep workloads running through hardware events using automated remediation and continuous health checks. Every GPU passes rigorous acceptance testing before joining the cluster.

Frontier research-powered training performance

The Together Kernel Collection, built by our Chief Scientist Tri Dao (creator of FlashAttention), delivers improved training and inference performance.

  • TKC
  • ThunderKittens
  • AI Training Performance: NVIDIA Hopper to Blackwell, with TKC

    TKC vs SOTA Approaches

    90% faster training

    Training a 70B parameter Llama-architecture model (BF16) with an optimized TorchTitan + Together Kernel Collection (TKC) reached 15,264 tokens/second/GPU on NVIDIA HGX B200, up from 8,080 tokens/second on NVIDIA HGX H100—a 90% jump in training speed.

    learn more
  • FP8 GEMM Performance (M x N x K)

    • ThunderKittens B200
    • cuBLAS H100
    • cuBLASB200

    ThunderKittens vs cuBLAS

    ~2× faster

    ThunderKittens’ FP8 kernel for NVIDIA HGX B200 matches NVIDIA cuBLAS GEMM performance while delivering ~2× speedup over H100 FP8 GEMMs, leveraging Blackwell’s Tensor Core–accelerated matrix operations.

    learn more

Orchestration flexibility for your AI workloads

Self-serve GPUs with hourly pricing per GPU.

  • Managed Kubernetes
    For training and inference
    Kubeadm-based upstream-compliant K8s
    Node autoscaling for elastic compute
    Managed Grafana for observability
    Flexible ingress configuration for inference
    HA control plane with managed upgrades
    Cert Manager for HTTPS endpoints
    Get started
  • Slurm on Kubernetes
    For training workloads
    Precise hardware control and gang scheduling
    Submit jobs via srun, sbatch
    Direct SSH access with Slurm simplicity and K8s-backed resilience
    Essential for distributed training synchronization
    Get started

Regions and availability zones

  • USA
    2GW+ in the portfolio with 600MW of near-term capacity in US.
  • Europe
    150 MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.
  • Asia & Middle East
    Options available based on the scale of the projects in Asia and the Middle East.

Choose from global regions to meet data residency and compliance requirements—HIPAA for healthcare, GDPR for Europe, or banking regulations.

Infrastructure you can trust at scale.
Production-grade security.

We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.

Learn More

As an NVIDIA Cloud Partner, Together builds and operates clusters on NVIDIA NCP reference architectures for predictable performance and faster time to production. Your data and models remain under your control with strict privacy safeguards and SOC 2–compliant security practices.

  • NVIDIA preferred partner
  • AICPA SOC 2 Type II

Customers running inference in production

    "Training our omnimodal Character-3 model required infrastructure designed for large-scale AI. The Together Frontier AI Factory delivered the performance we needed to push the boundaries of multimodal video generation. Together AI understands what builders need — and that made all the difference."

    Michael Lingelbach

    CEO, Hedra

      “Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider — they're a true innovation partner, enabling us to push creative boundaries without compromise.”

      Victor Perez

      Co-Founder, Krea