Models / OpenAI

OpenAI

Deploy gpt-oss-120B and gpt-oss-20B on Together AI. Frontier reasoning performance under Apache 2.0 with complete model ownership.

Why OpenAI on Together AI?

Designed for production workloads that need 
consistent performance and operational control.

Frontier reasoning, open license

gpt-oss-120B and gpt-oss-20B deliver o3-class reasoning performance under Apache 2.0, with no restrictions on commercial use, fine-tuning, or deployment.

Deploy anywhere, own everything

Air-gapped deployments, on-premises, or Together AI cloud. Full model ownership means your infrastructure, your data, your terms.

Enterprise infrastructure from day one

99.9% uptime SLA, multi-region deployment, SOC 2 Type II certified, and HIPAA compliant. North American infrastructure with US-based deployment.

Meet the OpenAI family

Explore top-performing models across text, image, video, code, and voice.

New

Chat

gpt-oss-120B

new

Image

GPT Image 1.5

New

Video

Sora 2 Pro

Transcribe

Whisper Large v3

Video

Sora 2

Transcribe

Whisper Large v3 (Streaming)

New

Code

gpt-oss-20B

Deployment options

Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.

  • Serverless

  • Inference

Serverless Inference

Real-time

A fully managed inference API that automatically scales with request volume.

Best for

Variable or unpredictable traffic

Rapid prototyping and iteration

Cost-sensitive or early-stage production workloads

Batch

Process massive workloads of up to 30 billion tokens asynchronously, at up to 50% less cost.

Best for

Classifying large datasets

Offline summarization

Synthetic data generation

Dedicated Inference

Dedicated Model Inference

An inference endpoint backed by reserved, isolated compute resources and the Together AI inference engine.

Best for

Predictable or steady traffic

Latency-sensitive applications

High-throughput production workloads

Dedicated Container Inference

Run inference with your own engine and model on fully-managed, scalable infrastructure.

Best for

Generative media models

Non-standard runtimes

Custom inference pipelines