💰 Announcing our Series C. Intelligence should be abundant, not expensive →

📊 Delivering 31% more TPS than the next-fastest OSS engine for production coding agent workloads →

⚡ On-demand B200s now available on Together GPU Clusters →

🚀 Now serving MiniMax-M3 for efficient inference →

Models / Mistral AI

Mistral AI

Deploy Mistral's model family on Together AI. State-of-the-art performance, Apache 2.0 open weights, and native multilingual support.

Why Mistral AI on Together AI?

Designed for production workloads that need  consistent performance and operational control.

State-of-the-art at 8× lower cost

Mistral models deliver enterprise-grade performance at a fraction of the price. Mistral Medium 3 benchmarks at state-of-the-art while cutting costs by 8× versus closed-source alternatives.

Multilingual and transparent by design

Native support across English, French, Spanish, German, Italian, and more. Magistral's reasoning chain is fully visible — transparent thinking you can follow and verify across languages.

From frontier to edge, open licensed

Apache 2.0 licensing, on-premises deployment, and fine-tuning on proprietary data. SOC 2 Type II certified and HIPAA compliant on Together AI's US-based infrastructure.

Meet the Mistral AI family

Explore top-performing models across text, image, video, code, and voice.

Deploy own model

Chat

Ministral 3 8B Instruct 2512

Chat

Ministral 3 14B Instruct 2512

Chat

Ministral 3 3B Instruct 2512

Chat

Mistral Small 3

Transcribe

Voxtral-Mini-3B-2507

Chat

Devstral Small 2505

Chat

Magistral Small 2506

Chat

Mistral

Chat

Mixtral 8x7B Instruct v0.1

Chat

Mixtral 8x7B v0.1

Chat

Mistral (7B) Instruct v0.2

Chat

Mistral Instruct

Have your own model?

Deploy custom containers on Together’s managed GPU infrastructure with automatic scaling, job queues, and built-in observability.

Deployment options

Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.

Serverless Inference
Provisioned  Throughput
Dedicated Model  Inference
Dedicated Container  Inference

Serverless Inference

A fully managed real-time or batch inference API with access to dozens of the most popular AI models.

Best for

Variable or unpredictable traffic

Rapid prototyping and iteration

Cost-sensitive or early-stage production workloads

Provisioned  Throughput

Reserved token capacity with SLA guarantees. Priced in PTUs, a normalized throughput unit.

Best for

Production workloads

Reliability guarantees

Predictable pricing

Dedicated Model  Inference

An inference endpoint backed by reserved, isolated compute resources and Together AI inference research.

Best for

Predictable or steady traffic

Latency-sensitive applications

High-throughput production workloads

Dedicated Container  Inference

Run inference with your own engine and model on fully-managed, scalable infrastructure.

Best for

Generative media models

Non-standard runtimes

Custom inference pipelines