Meta Models | Together AI

Why Meta on Together AI?

Designed for production workloads that need  consistent performance and operational control.

Open source freedom, enterprise grade

Full model ownership — download the weights, deploy on Together AI’s cloud, or run on-premises. Your data never trains our models and never leaves your control.

Frontier multimodal performance

Llama 4 Maverick beats GPT-4o and Gemini 2.0 Flash on key benchmarks at just $0.27/1M tokens — an 80%+ cost reduction versus closed-source alternatives.

Built for scale, ready for enterprise

SOC 2 Type II certified, HIPAA compliant, with dedicated endpoints, monthly reserved capacity, and up to 40% savings at volume.

Meet the Meta family

Explore top-performing models across text, image, video, code, and voice.

Browse models

Deploy own model

Free

Chat

Llama 3.3 70B Instruct Turbo Free

Chat

Llama 3.1 405B

Chat

Llama 3.3 70B

Chat

NIM Mixtral 8x7B Instruct v0.1

Moderation

Llama Guard 4 12B

Chat

Llama 4 Maverick

Chat

NIM Llama 3.1 70B Instruct

Chat

NIM Llama 3.1 8B Instruct

Chat

LLaMA-2

Vision

NIM Llama 3.2 11B Vision Instruct

Chat

Llama 3 70B Instruct Reference

Vision

NIM Llama 3.2 90B Vision Instruct

Chat

NIM Llama 3.3 70B Instruct

Chat

NIM Llama 3.3 Nemotron Super 49B v1

Chat

NIM Mistral-NeMo 12B Instruct

Chat

NIM Mixtral 8x22B Instruct v0.1

Chat

Llama 3.2 3B Instruct Turbo

Moderation

Llama Guard (7B)

Moderation

Llama Guard 2 8B

Moderation

Llama Guard 3 11B Vision Turbo

Moderation

Llama Guard 3 8B

Chat

Llama 3 8B Instruct Lite

Chat

LLaMA-2 Chat (13B)

Chat

LLaMA-2 Chat (7B)

Chat

Llama 3.1 8B

Chat

Llama 3.1 70B

Chat

Llama 4 Scout

Chat

NIM Llama 3.1 Nemotron 70B Instruct

Have your own model?

Deploy custom containers on Together’s managed GPU infrastructure with automatic scaling, job queues, and built-in observability.

Learn more

Deployment options

Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.

Serverless Inference
Provisioned  Throughput
Dedicated Model  Inference
Dedicated Container  Inference

Serverless Inference

A fully managed real-time or batch inference API with access to dozens of the most popular AI models.

Best for

Variable or unpredictable traffic

Rapid prototyping and iteration

Cost-sensitive or early-stage production workloads

Get started

Explore Docs

Provisioned  Throughput

Reserved token capacity with SLA guarantees. Priced in PTUs, a normalized throughput unit.

Best for

Production workloads

Reliability guarantees

Predictable pricing

Get started

Explore Docs

Dedicated Model  Inference

An inference endpoint backed by reserved, isolated compute resources and Together AI inference research.

Best for

Predictable or steady traffic

Latency-sensitive applications

High-throughput production workloads

Get started

Explore Docs

Dedicated Container  Inference

Run inference with your own engine and model on fully-managed, scalable infrastructure.

Best for

Generative media models

Non-standard runtimes

Custom inference pipelines

Contact sales

Explore Docs

Meta

Why Meta on Together AI?

Meet the Meta family

Have your own model?

Deployment options

Serverless Inference

Provisioned Throughput

Dedicated Model Inference

Dedicated Container Inference

Provisioned  Throughput

Dedicated Model  Inference

Dedicated Container  Inference