⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

  • Inference

    • Serverless Inference

      High-performance inference as APIs

    • Batch Inference

      Inference for batch workloads

    • Dedicated Model Inference

      Inference on custom hardware

    • Dedicated Container Inference

      Inference for custom models

    MiniMax M2.5
    Nano Banana Pro
    Qwen3.5-397B
    GLM-5
    kimi k2.5
    gpt-oss-120B

    Model library

    Explore the top open-source models

  • Compute

    Accelerated Compute

    • GPU Clusters

      Reliable GPU clusters at scale

    • AI Factory

      Custom infrastructure at frontier scale

    Developer Environments

    • Sandbox

      Build development environments for AI

    Storage

    • Managed Storage

      Store model weights & data securely

    • GB300

    • GB200

    • B200

    • H200

    • H100

  • Model Shaping

    • Fine-Tuning

      Shape models with your data

    • Evaluations

      Measure model quality

    DeepSeek V3.1
    GLM 5 FP4
    Qwen3-VL 32B
    gpt-oss-120b
    kimi k2.5
    Llama 4 Maverick

    Model library

    Fine-tune top open-source models

  • Research

    • Research

      Systems research for production AI

    • Research blog

      All our research publications

    Featured publications

    • FlashAttention

    • ATLAS

    • Kernel Collection

    • ThunderKittens

    • DSGym

    Show all
  • Developers

    • Documentation

      Technical docs for Together AI

    • Demos

      Our open-source demo apps

    • Cookbooks

      Practical implementation guides

    • Voice Agents

      Build voice agents for production

    • Model Library

    • Playground

    • Together Chat

    • Which LLM to use

  • Company

    Resources

    • Customer stories

      Testimonials from AI Natives

    • Startup accelerator

      Build and scale your startup

    • Customer support

      Find answers to your questions

    • Blog

      Our latest news & blog posts

    • Events

      Explore our events calendar

    Company

    • About

      Get to know us

    • Careers

      Join our mission

  • Pricing

    • Serverless Inference

      High-performance inference as APIs

    • Batch Inference

      Inference for batch workloads

    • Dedicated Model Inference

      Inference on custom hardware

    • Dedicated Container Inference

      Inference for custom models

    MiniMax M2.5
    Nano Banana Pro
    Qwen3.5-397B
    GLM-5
    kimi k2.5
    gpt-oss-120B

    Model library

    Explore the top open-source models

  • Accelerated Compute

    • GPU Clusters

      Reliable GPU clusters at scale

    • AI Factory

      Custom infrastructure at frontier scale

    Developer Environments

    • Sandbox

      Build development environments for AI

    Storage

    • Managed Storage

      Store model weights & data securely

    • GB300

    • GB200

    • B200

    • H200

    • H100

    • Fine-Tuning

      Shape models with your data

    • Evaluations

      Measure model quality

    DeepSeek V3.1
    GLM 5 FP4
    Qwen3-VL 32B
    gpt-oss-120b
    kimi k2.5
    Llama 4 Maverick

    Model library

    Fine-tune top open-source models

    • Research

      Systems research for production AI

    • Research blog

      All our research publications

    Featured publications

    • FlashAttention

    • ATLAS

    • Kernel Collection

    • ThunderKittens

    • DSGym

    Show all
    • Documentation

      Technical docs for Together AI

    • Demos

      Our open-source demo apps

    • Cookbooks

      Practical implementation guides

    • Voice Agents

      Build voice agents for production

    • Model Library

    • Playground

    • Together Chat

    • Which LLM to use

  • Resources

    • Customer stories

      Testimonials from AI Natives

    • Startup accelerator

      Build and scale your startup

    • Customer support

      Find answers to your questions

    • Blog

      Our latest news & blog posts

    • Events

      Explore our events calendar

    Company

    • About

      Get to know us

    • Careers

      Join our mission

Contact sales
Contact sales
Sign in
Legal

Cookie Policy

Table of contents
This is some text inside of a div block.

Start building on Together AI

From optimized training and model shaping to large-scale production inference.

 Get started now
  • Products

    • Accelerated Compute

    • Serverless Inference

    • Dedicated Inference

    • Fine-Tuning

    • Sandbox

    • Evaluations

  • Models

    See all models

    DeepSeek

    Meta

    Qwen

    Google

    OpenAI

    Mistral AI

    Custom models

  • Developers

    • Research

    • Docs

    Pricing

    • Pricing overview

    • Inference

    • Fine-Tuning

    • GPU Clusters

  • Resources

    • Blog

    • About us

    • Careers

    • Customer Stories

    • Support

  • Privacy Policy

  • Terms of service

© 2026 Together AI. All Rights Reserved.