For founders and builders defining the AI-native era. Register now →
🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →
⚡ Together Instant Clusters: self-service NVIDIA GPUs, now generally available →
📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →
🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →
Model Platform
Model Platform

Products

Serverless Inference

API for inference on open-source models

Dedicated Endpoints

Deploy models on custom hardware

Fine-Tuning

Train & improve high-quality, fast models

Evaluations

Measure model quality

Together Chat

Chat app for open-source AI

Code Execution

Code Sandbox

Build AI development environments

Code Interpreter

Execute LLM-generated code

Tools

Which LLM to Use

Find the ‘right’ model for your use case

Models

See all models →

OpenAI

gpt-oss
 →
OpenAI
gpt-oss
This is some text inside of a div block.
 →
try it →
DeepSeek
 →
DeepSeek
This is some text inside of a div block.
 →
try it →
Qwen
 →
Qwen
This is some text inside of a div block.
 →
try it →
Llama
 →
Llama
This is some text inside of a div block.
 →
try it →
Kimi K2
 →
Kimi K2
This is some text inside of a div block.
 →
try it →
GLM-4.7
 →
Apriel
This is some text inside of a div block.
 →
try it →
GPU Cloud
GPU Cloud

Clusters of Any Size

Instant Clusters

Ready to use, self-service GPUs

Reserved Clusters

Dedicated capacity, with expert support

Frontier AI Factory

1K → 10K → 100K+ NVIDIA GPUs

Cloud Services

Data Center Locations

Global GPU power in 25+ cities

Slurm

Cluster management system

GPUs

NVIDIA GB200 NVL72
 →
NVIDIA GB00 NVL72
try it →
NVIDIA HGX B200
 →
NVIDIA HGX B200
try it →
NVIDIA H200
 →
NVIDIA H200
try it →
NVIDIA H100
 →
NVIDIA H100
try it →
Solutions
Solutions

Solutions

Customer Stories

Testimonials from AI pioneers

Startup Accelerator

Build and scale your startup

Enterprise

Secure, reliable AI infrastructure

Why Open Source

How to own your AI

Industries & Use-Cases

Scale your business with Together AI

Customer Stories

How Hedra Scales Viral AI Video Generation with 60% Cost Savings

When Standard Inference Frameworks Failed, Together AI Enabled 5x Performance Breakthrough

Developers
Developers

Developers

Documentation

Technical docs for using Together AI

Research

Advancing the open-source AI frontier

Model Library

All our open-source models

Cookbooks

Practical implementation guides

Example Apps

Our open-source demo apps

Videos

DeepSeek-R1: How It Works, Simplified!

Together Code Sandbox: How To Build AI Coding Agents

Pricing
Pricing

Pricing

Pricing Overview

Our platform & GPU pricing.

Inference

Per-token & per-minute pricing.

Fine-Tuning

LoRA and full fine-tuning pricing.

GPU Clusters

Hourly rates & custom pricing.

Questions?

We’re here to help!

Talk to us →

Company
Company

Company

About us

Get to know us

Values

Our approach to open-source AI

Team

Meet our leadership

Careers

Join our mission

Resources

Blog

Our latest news & blog posts

Research

Advancing the open-source AI frontier

Events

Explore our events calendar

Knowledge Base

Find answers to your questions

Featured content

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Best practices to accelerate inference for large-scale production workloads

Sign inContact sales
Chat
Docs
Blog
Support
Contact Sales

Subscribe to newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
  • Products
  • Solutions
  • Research
  • Blog
  • About
  • Careers
  • Pricing
  • Contact
  • Support
  • Status
  • Trust Center

© 2026 San Francisco, CA 94114

  • Consent Preferences
  • Cookie Policy
  • Privacy policy
  • Terms of service
Together.ai