Introducing Together AI's new look →

For founders and builders defining the AI-native era. Register now →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together Instant Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

  • Inference

    • Serverless Inference

      High-performance inference as APIs

    • Batch Inference

      Inference for batch workloads

    • Dedicated Model Inference

      Inference on custom hardware

    • Dedicated Container Inference

      Inference for custom models

    MiniMax M2.5
    Nano Banana Pro
    Qwen3.5-397B
    GLM-5
    kimi k2.5
    gpt-oss-120B

    Model library

    Explore the top open-source models

  • Compute

    • Accelerated Compute

      Scale with research-optimized GPUs

    • Sandbox

      Build development environments for AI

    • Managed Storage

      Store model weights & data securely

    • GB300

    • GB200

    • B200

    • H200

  • Model Shaping

    • Fine-Tuning

      Shape models with your data

    • Evaluations

      Measure model quality

    DeepSeek V3.1
    GLM 5 FP4
    Qwen3-VL 32B
    gpt-oss-120b
    kimi k2.5
    Llama 4 Maverick

    Model library

    Fine-tune top open-source models

  • Research

    • Research

      Systems research for production AI

    • Research blog

      All our research publications

    Featured publications

    • FlashAttention

    • ATLAS

    • Kernel Collection

    • ThunderKittens

    • DSGym

    Show all
  • Developers

    • Documentation

      Technical docs for Together AI

    • Demos

      Our open-source demo apps

    • Cookbooks

      Practical implementation guides

    • Model Library

    • Playground

    • Together Chat

    • Which LLM to use

  • Company

    Solutions

    • Customer stories

      Testimonials from AI Natives

    • Startup accelerator

      Build and scale your startup

    • Customer support

      Find answers to your questions

    Company

    • About

      Get to know us

    • Careers

      Join our mission

    Solutions

    • Blog

      Our latest news & blog posts

    • Events

      Explore our events calendar

  • Pricing

    • Serverless Inference

      High-performance inference as APIs

    • Batch Inference

      Inference for batch workloads

    • Dedicated Model Inference

      Inference on custom hardware

    • Dedicated Container Inference

      Inference for custom models

    MiniMax M2.5
    Nano Banana Pro
    Qwen3.5-397B
    GLM-5
    kimi k2.5
    gpt-oss-120B

    Model library

    Explore the top open-source models

    • Accelerated Compute

      Scale with research-optimized GPUs

    • Sandbox

      Build development environments for AI

    • Managed Storage

      Store model weights & data securely

    • GB300

    • GB200

    • B200

    • H200

    • Fine-Tuning

      Shape models with your data

    • Evaluations

      Measure model quality

    DeepSeek V3.1
    GLM 5 FP4
    Qwen3-VL 32B
    gpt-oss-120b
    kimi k2.5
    Llama 4 Maverick

    Model library

    Fine-tune top open-source models

    • Research

      Systems research for production AI

    • Research blog

      All our research publications

    Featured publications

    • FlashAttention

    • ATLAS

    • Kernel Collection

    • ThunderKittens

    • DSGym

    Show all
    • Documentation

      Technical docs for Together AI

    • Demos

      Our open-source demo apps

    • Cookbooks

      Practical implementation guides

    • Model Library

    • Playground

    • Together Chat

    • Which LLM to use

  • Solutions

    • Customer stories

      Testimonials from AI Natives

    • Startup accelerator

      Build and scale your startup

    • Customer support

      Find answers to your questions

    Company

    • About

      Get to know us

    • Careers

      Join our mission

    Solutions

    • Blog

      Our latest news & blog posts

    • Events

      Explore our events calendar

Contact sales
Contact sales
Sign in
Explore Research

Research blog

All
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Agents
CoderForge-Preview: SOTA open dataset for training efficient coding agents

By Alpay Ariyak*, Junda Zhang, Junxiong Wang, Shang Zhu, Federico Bianchi, Sanjana Srivastava, Ashwinee Panda, Siddhant Bharti, Chenfeng Xu, John Heo, Xiaoxia Shirley Wu, James Zou, Percy Liang, Leon Song, Ce Zhang, Ben Athiwaratkun, Zhongzhu Zhou*, Qingyang Wu* *Project Core Leads

Agents
How speech models fail where it matters the most and what to do about it

Kaitlyn Zhou, Martijn Bartelds, Federico Bianchi, James Zou

Inference
Consistency diffusion language models: Up to 14x faster inference without sacrificing quality

Minseo Kim, Chenfeng Xu, Coleman Richard Charles Hooper, Harman Singh, Ben Athiwaratkun, Ce Zhang, Kurt Keutzer, Amir Gholami | Seoul National University, University of California, Berkeley, Together AI

Inference
Cache-aware prefill–decode disaggregation (CPD) for up to 40% faster long-context LLM serving

Jiejing Zhang, Yubo Wang, Yinghui Liu, Mourya Vangala Srinivasa, Chenxi Li, Jue Wang, Yineng Zhang, Shuaiwen Leon Song, Ce Zhang

Load more
1 / 17

No search results

Try expanding your search or changing the filters.

Be at the forefront of AI innovation

From optimized training and model shaping to large-scale production inference

See open roles
  • Products

    • Accelerated Compute

    • Serverless Inference

    • Dedicated Inference

    • Fine-Tuning

    • Sandbox

    • Evaluations

  • Models

    See all models

    DeepSeek

    Meta

    Qwen

    Google

    OpenAI

    Mistral AI

    Custom models

    • Accelerated Compute

    • Serverless Inference

    • Dedicated Inference

    • Fine-Tuning

    • Sandbox

    • Evaluations

  • Developers

    • Research

    • Docs

    Pricing

    • Pricing overview

    • Inference

    • Fine-Tuning

    • GPU Clusters

  • Resources

    • Blog

    • About us

    • Careers

    • Customer Stories

    • Support

  • Consent Preferences

  • Cookie Policy

  • Privacy Policy

  • Terms of service

© 2026 Together AI. All Rights Reserved.