Models / OpenAI

OpenAI

Deploy gpt-oss-120B and gpt-oss-20B on Together AI. Frontier reasoning performance under Apache 2.0 with complete model ownership.

Try Model API

Why OpenAI on Together AI?

Designed for production workloads that need  consistent performance and operational control.

Frontier reasoning, open license

gpt-oss-120B and gpt-oss-20B deliver o3-class reasoning performance under Apache 2.0, with no restrictions on commercial use, fine-tuning, or deployment.

Deploy anywhere, own everything

Air-gapped deployments, on-premises, or Together AI cloud. Full model ownership means your infrastructure, your data, your terms.

Enterprise infrastructure from day one

99.9% uptime SLA, multi-region deployment, SOC 2 Type II certified, and HIPAA compliant. North American infrastructure with US-based deployment.

Meet the OpenAI family

Explore top-performing models across text, image, video, code, and voice.

Browse models

Deploy own model

new

Image

GPT Image 2

Chat

gpt-oss-120B

Transcribe

Whisper Large v3

Transcribe

Whisper Large v3 (Streaming)

Chat

gpt-oss-20B

Image

GPT Image 1.5

Video

Sora 2 Pro

Video

Sora 2

Have your own model?

Deploy custom containers on Together’s managed GPU infrastructure with automatic scaling, job queues, and built-in observability.

Learn more

Deployment options

Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.

Serverless Inference
Provisioned  Throughput
Dedicated Model  Inference
Dedicated Container  Inference

Serverless Inference

A fully managed real-time or batch inference API with access to dozens of the most popular AI models.

Best for

Variable or unpredictable traffic

Rapid prototyping and iteration

Cost-sensitive or early-stage production workloads

Get started

Explore Docs

Provisioned  Throughput

Reserved token capacity with SLA guarantees. Priced in PTUs, a normalized throughput unit.

Best for

Production workloads

Reliability guarantees

Predictable pricing

Get started

Explore Docs

Dedicated Model  Inference

An inference endpoint backed by reserved, isolated compute resources and Together AI inference research.

Best for

Predictable or steady traffic

Latency-sensitive applications

High-throughput production workloads

Get started

Explore Docs

Dedicated Container  Inference

Run inference with your own engine and model on fully-managed, scalable infrastructure.

Best for

Generative media models

Non-standard runtimes

Custom inference pipelines

Contact sales

Explore Docs

OpenAI

Why OpenAI on Together AI?

Meet the OpenAI family

Have your own model?

Deployment options

Serverless Inference

Provisioned Throughput

Dedicated Model Inference

Dedicated Container Inference

Provisioned  Throughput

Dedicated Model  Inference

Dedicated Container  Inference