Models / Qwen

Qwen

Deploy Qwen3 and QwQ models on Together AI. Hybrid reasoning, agentic coding, and OpenAI-compatible API — open source under Apache 2.0.

Try Model API

Why Qwen on Together AI?

Designed for production workloads that need  consistent performance and operational control.

Drop-in OpenAI replacement

Same API format, hybrid thinking mode, and multilingual support. Migrate from OpenAI with zero code changes.

From edge to frontier, one family

Models spanning sub-1B to 480B+ parameters with adaptive scaling for every use case and budget.

Open source, enterprise licensed

Apache 2.0 licensing gives you full commercial freedom. SOC 2 Type II certified, HIPAA compliant, US-based infrastructure.

Meet the Qwen family

Explore top-performing models across text, image, video, code, and voice.

Browse models

Deploy own model

new

Chat

Qwen3.7-Plus

new

Chat

Qwen3.7-Max

new

Chat

Qwen3.6 35B A3B FP8

Chat

Qwen3.5-397B-A17B

New

Chat

Qwen3.5 9B

new

Image

Qwen Image 2.0 Pro

new

Image

Qwen Image 2.0

Chat

Qwen3 235B A22B FP8 Throughput

Chat

Qwen3-Coder 480B A35B Instruct

Chat

Qwen3-Coder-Next

new

Chat

Qwen3.6-Plus

Chat

Qwen2.5 7B Instruct Turbo

Chat

Qwen3 32B

Chat

Qwen3 0.6B

Chat

Qwen3 0.6B Base

Chat

Qwen3 1.7B

Chat

Qwen3 1.7B Base

Chat

Qwen3 14B Base

Chat

Qwen3 30B A3B

Chat

Qwen3 30B A3B Base

Chat

Qwen3 4B

Chat

Qwen3 4B Base

Chat

Qwen3 8B

Image

Qwen Image

Coming Soon

Image

Qwen Image Edit

Chat

Qwen3-Next-80B-A3B-Instruct

Chat

Qwen2.5 72B

Chat

Qwen2.5 Coder 32B Instruct

Chat

Qwen QwQ-32B

Vision

Qwen2.5-VL 72B Instruct

Chat

Qwen3 235B A22B Instruct 2507 FP8 Throughput

Chat

Qwen3 235B A22B Thinking 2507 FP8

Chat

Qwen3-Next-80B-A3B-Thinking

Vision

Qwen3-VL-32B-Instruct

Have your own model?

Deploy custom containers on Together’s managed GPU infrastructure with automatic scaling, job queues, and built-in observability.

Learn more

Deployment options

Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.

Serverless Inference
Provisioned  Throughput
Dedicated Model  Inference
Dedicated Container  Inference

Serverless Inference

A fully managed real-time or batch inference API with access to dozens of the most popular AI models.

Best for

Variable or unpredictable traffic

Rapid prototyping and iteration

Cost-sensitive or early-stage production workloads

Get started

Explore Docs

Provisioned  Throughput

Reserved token capacity with SLA guarantees. Priced in PTUs, a normalized throughput unit.

Best for

Production workloads

Reliability guarantees

Predictable pricing

Get started

Explore Docs

Dedicated Model  Inference

An inference endpoint backed by reserved, isolated compute resources and Together AI inference research.

Best for

Predictable or steady traffic

Latency-sensitive applications

High-throughput production workloads

Get started

Explore Docs

Dedicated Container  Inference

Run inference with your own engine and model on fully-managed, scalable infrastructure.

Best for

Generative media models

Non-standard runtimes

Custom inference pipelines

Contact sales

Explore Docs

Qwen

Why Qwen on Together AI?

Meet the Qwen family

Have your own model?

Deployment options

Serverless Inference

Provisioned Throughput

Dedicated Model Inference

Dedicated Container Inference

Provisioned  Throughput

Dedicated Model  Inference

Dedicated Container  Inference