GPU / H200

NVIDIA H200

High-performance LLM inference

Why H200 on Together GPU Clusters?

The world’s most powerful AI infrastructure. Delivered faster. Tuned smarter.

2x performance over H100

Each H200 GPU cluster offers double the inference throughput compared to H100, ideal for deploying LLMs at unprecedented scale.

Enhanced memory bandwidth

With 141GB HBM3e GPU memory and 4.8TB/s bandwidth, H200 significantly accelerates memory-intensive generative AI workloads and HPC applications.

Maximum efficiency and TCO savings

Achieve higher performance within the same power profile as previous-gen GPUs, drastically reducing energy consumption and total cost of ownership.

Run by researchers who train models

Our research team actively runs and tunes training workloads on NVIDIA H200 systems for edge-of-possibility expertise.

What our customers are saying

    "Delivering competitive pricing, strong reliability and a properly set up cluster is the bulk of the value differentiation for most AI clouds. The only differentiated value we have seen outside this set is from a Neocloud called Together AI, where the inventor of FlashAttention, Tri Dao, works. We don't believe the value created by Together can be replicated elsewhere."

    Dylan Patel

    Founder, SemiAnalysis

      "Training our omnimodal Character-3 model required infrastructure designed for large-scale AI. The Together Frontier AI Factory delivered the performance we needed to push the boundaries of multimodal video generation. Together AI understands what builders need — and that made all the difference."

      Michael Lingelbach

      CEO, Hedra

        "Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators."

        Demi Guo

        CEO, Pika

          “Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider — they're a true innovation partner, enabling us to push creative boundaries without compromise.”

          Victor Perez

          Co-Founder, Krea

          Outstanding specs of H200

          Performance
          Faster inference

          2x

          vs H100

          Faster inference

          110x

          higher

          Better efficiency

          1.4x

          vs H100

          Salesforce AI Research

            "We’ve been thoroughly impressed with the Together Enterprise Platform. It has delivered a 2x reduction in latency (time to first token) and cut our costs by approximately a third. These improvements allow us to launch AI-powered features and deliver lightning-fast experiences faster than ever before."

            Caiming Xiong

            VP Salesforce AI Research

            Technical specification

            • Hopper GPUs 8 GPUs
            • FP8 Tensor Core 3,958 TFLOPS
            • FP16/BF16 Tensor Core 1,979 TFLOPS
            • TF32 Tensor Core 989 TFLOPS
            • GPU Memory 141 GB HBM3e
            • GPU Memory Bandwidth 4.8 TB/s
            • Total NVLink Bandwidth 900 GB/s
            • Multi-Instance GPU (MIG) 7 (@18 GB each)
            • Decoders 7 NVDEC, 7 JPEG
            • Max Thermal Design Power (TDP) Configurable up to 700 W
            • Interconnect NVLink: 900 GB/s, PCIe Gen5: 128 GB/s
            • Server Options NVIDIA HGX H200 partner and Certified Systems with 4 or 8 GPUs

            Infrastructure you can trust at scale.
            Production-grade security.

            We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.

            Learn More

            As an NVIDIA Cloud Partner, Together builds and operates clusters on NVIDIA NCP reference architectures for predictable performance and faster time to production. Your data and models remain under your control with strict privacy safeguards and SOC 2–compliant security practices.

            • NVIDIA preferred partner
            • AICPA SOC 2 Type II

            Regions and availability zones

            Choose from global regions to meet data residency and compliance requirements—HIPAA for healthcare, GDPR for Europe, or banking regulations.

            • USA
              2GW+ in the portfolio with 600MW of near-term capacity in US.
            • Europe
              150 MW+ available in Europe: UK, Spain, France, Portugal, and Iceland also.
            • Asia & Middle East
              Options available based on the scale of the projects in Asia and the Middle East.