Accelerated Compute

Research-optimized GPU compute for AI Natives and enterprises

Train, fine-tune, and deploy on self-service GPU clusters optimized by frontier research — with flexible pricing, production reliability, and security built in.

Why Together Accelerated Compute?

Research-led performance, NVIDIA-validated architecture, and enterprise-grade reliability to train, fine-tune, and serve at any scale.

Faster training & inference

Together Kernel Collection delivers custom CUDA kernels from the FlashAttention team — 90% faster training on NVIDIA Blackwell GPUs with software-led optimization on top of NVIDIA accelerated hardware.

Predictable performance at scale

Go from zero to running workloads in minutes with batteries-included GPU clusters. Virtualized for elasticity, bare-metal where it matters — GPU and network performance without compromise. Pre-configured drivers, built-in observability, and managed orchestration included.

Production reliability & security

Self-healing infrastructure with 99.9% uptime, backed by SOC 2 Type II encryption in transit/at rest, and tenant-level isolation.

Performance-optimized NVIDIA hardware

  • NVIDIA GB200 NVL72
    Memory
    192GB HBM3e per GPU
    Use Case
    Heavy training and inference at frontier scale
    Availability
    Available now through AI Factory
    Learn how
  • NVIDIA B200
    Memory
     192GB HBM3e
    Use Case
    Blackwell architecture
    AI reasoning and training workloads
    Availability
    Available now through GPU Clusters
    Learn how
  • NVIDIA H200
    Memory
    141GB HBM3e
    Use Case
    Extended memory
    Large model training and inference
    Availability
    Available now
    Starting Price
    From $2.09/GPU-hour
    Learn how
  • NVIDIA H100
    Memory
    80GB HBM2e
    Use Case
    Proven performance
    Foundation model workloads
    Availability
    Available now
    Starting Price
    From $1.75/GPU-hour
    Learn how

GPU Clusters, built for production

Choose the right GPUs, deploy with the orchestration stack you prefer, and operate with the observability and security required for production.

    • Managed infrastructure

      Pre-configured drivers
      Observability
      Zero setup overhead

      Deploy GPU clusters with integrated observability, managed orchestration, drivers, and networking entirely pre-configured. Run production workloads instantly without manual infrastructure setup.

    • Orchestration flexibility

      Kubernetes
      Slurm
      Fully managed

      Deploy Kubernetes for open-source extensibility, or run Slurm for precise hardware control and gang scheduling. Both fully managed.

    • Self-healing infrastructure

      Acceptance testing
      Automated remediation
      Health checks

      Keep workloads running through hardware events using automated remediation and continuous health checks. Every GPU passes rigorous acceptance testing before joining the cluster.

    • Flexible pricing modes

      No commitments
      Guaranteed capacity
      Self-serve reservations

      On-demand for flexibility, reserved for guaranteed capacity and better rates. Both fully self-serve.

Frontier research-powered training performance

The Together Kernel Collection, built by our Chief Scientist Tri Dao (creator of FlashAttention), delivers improved training and inference performance.

  • TKC
  • ThunderKittens
  • AI Training Performance: NVIDIA Hopper to Blackwell, with TKC

    TKC vs SOTA Approaches

    90% faster training

    Training a 70B parameter Llama-architecture model (BF16) with an optimized TorchTitan + Together Kernel Collection (TKC) reached 15,264 tokens/second/GPU on NVIDIA HGX B200, up from 8,080 tokens/second on NVIDIA HGX H100—a 90% jump in training speed.

    learn more
  • FP8 GEMM Performance (M x N x K)

    • ThunderKittens B200
    • cuBLAS H100
    • cuBLASB200

    ThunderKittens vs cuBLAS

    ~2× faster

    ThunderKittens’ FP8 kernel for NVIDIA HGX B200 matches NVIDIA cuBLAS GEMM performance while delivering ~2× speedup over H100 FP8 GEMMs, leveraging Blackwell’s Tensor Core–accelerated matrix operations.

    learn more

Deployments 
for any scale

  • GPU Clusters
    GPU clusters at scale

    Spin up clusters in minutes with Kubernetes or Slurm. Choose on-demand or reserved capacity. Scale from 8 GPUs to 4,000+.

    Explore GPU Clusters
  • AI Factory
    Custom infrastructure at frontier scale

    Bespoke infrastructure at factory scale, starting from 1,000+ GPUs. Powered by NVIDIA accelerated compute with Together's research team continuously optimizing performance for your AI workloads.

    Contact Sales

Regions and availability zones

Launch close to your users and data across 25+ cities.

  • USA
    2GW+ in the portfolio with 600MW of near-term capacity in US.
  • Europe
    150 MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.
  • Asia & Middle East
    Options available based on the scale of the projects in Asia and the Middle East.

Choose from global regions to meet data residency and compliance requirements—HIPAA for healthcare, GDPR for Europe, or banking regulations.

Reference-architecture performance.
Production-grade security.

We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.

Learn More

As an NVIDIA Cloud Partner, Together builds and operates clusters on NVIDIA NCP reference architectures for predictable performance and faster time to production. Your data and models remain under your control with strict privacy safeguards and SOC 2–compliant security practices.

  • NVIDIA preferred partner
  • AICPA SOC 2 Type II

Customers running inference in production

    "Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators."

    Demi Guo

    CEO, Pika

      “Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider — they're a true innovation partner, enabling us to push creative boundaries without compromise.”

      Victor Perez

      Co-Founder, Krea