Qwen

Think Deeper. Act Faster.

Open-source AI family powering instant chat and deep reasoning. Deploy hybrid models with 119-language support and coding, math, vision skills.

Build with Qwen API

Get Started in Minutes

Open-source drop-in for OpenAI: same API, hybrid thinking, multilingual support. Switch to Qwen instantly.


# Install the Together AI library
pip install together 

# Get started with Qwen3 
from together import Together 

client = Together() 

response = client.chat.completions.create( 
    model="Qwen/Qwen3-235B-A22B-fp8-tput",
    messages=[ 
        {
            "role": "user", 
            "content": "Solve this step by step /think"
            enable_thinking=True
        } 
    ] 
)  

print(response.choices[0].message.content)

View API Docs

Why Qwen on Together AI?

From sub-1B to 200B+ parameters - one model family that adapts to every scale

‍

Open-source AI family with breakthrough hybrid thinking in newest flagship models, beats OpenAI o1 on benchmarks, and offers specialized variants for every domain with exceptional fine-tuning and complete model ownership.

VERIFIED BENCHMARK LEADERSHIP
Qwen3-235B-A22B beats OpenAI o1 across key metrics: 85.7 vs 74.3 on AIME'24, 98.0% vs 96.4% on MATH-500, and 2,056 vs 1,891 CodeForces rating. Achieves frontier performance using only 22B activated parameters vs o1's larger footprint.
‍
Outperforms o1 with 90% lower computational cost
‍
HYBRID THINKING ARCHITECTURE
Revolutionary dual-mode system that switches between instant responses and deep reasoning on demand. Qwen3 models can provide lightning-fast answers for simple queries or engage step-by-step thinking for complex problems - all controlled by a simple parameter.
‍
Single model replaces multiple specialized models
‍
COMPLETE OPEN SOURCE ECOSYSTEM
Full model weights available under Apache 2.0 license with specialized variants for every use case. Deploy anywhere - cloud, VPC, or on-premise. You own your models, your data, and your AI pipeline completely.
‍
Complete model ownership vs closed-source dependencies
‍

Meet the Qwen Herd

From ultralight edge models to reasoning powerhouses, choose the Qwen variant that perfectly fits your needs.

Qwen3-235B-A22B-FP8

Flagship hybrid reasoning model

Starting at $0.20/1M tokens

22B
Active Params
235B
Total Params
128K
Context
85.7
AIME'24 Score

Key Strengths:

Beats OpenAI o1 on reasoning benchmarks
MoE efficiency delivers frontier performance
Hybrid thinking modes

Try it out

Qwen2.5-VL-72B

Vision-language powerhouse

Starting at $1.95/1M tokens

72B
Parameters
Vision
Capable
Dynamic
Resolution
Hour-long
Video Support

Key Strengths:

Advanced visual reasoning and video understanding
Structured output generation
Agentic capabilities

Try it out

QwQ-32B

Specialized reasoning model

Starting at $1.20/1M tokens

32B
Parameters
131K
Context
RL
Trained
98.0%
MATH Score

Key Strengths:

Excels in complex reasoning tasks
Pure RL methodology
Step-by-step transparent thinking

Try it out

Qwen2.5-Coder-32B

SOTA open-source coding model

Starting at $0.80/1M tokens

32B
Parameters
128K
Context
92.7
HumanEval
90.2
MBPP

Key Strengths:

Advanced code generation and reasoning
Code fixing capabilities
Matches GPT-4o coding performance

Try it out

Qwen2.5-72B

High-performance dense model

Starting at $1.20/1M tokens

72B
Parameters
32K
Context
FP8
Quantization
Dense
Architecture

Key Strengths:

Advanced language processing
Production-ready
Balanced performance and cost

Try it out

Qwen2.5-7B-Turbo

Ultra-efficient model

Starting at $0.30/1M tokens

7.61B
Parameters
131K
Context
FP8
Quantization
Turbo
Speed

Key Strengths:

Ultra-efficient inference
Edge deployment ready
Cost-effective for high volume

Try it out

The real* Qwen capybara seen hanging out near the Golden Gate Bridge.

Breakthrough Technical Innovations

Qwen models introduce breakthrough architectural advances that redefine open-source AI capabilities.

Hybrid Thinking Architecture
First open-source models with controllable thinking modes: toggle between instant answers & step-by-step reasoning via one parameter—no external CoT prompting.
Scalable thinking budget control
Mixture of Experts Efficiency
Qwen3-235B-A22B’s advanced MoE activates only 22B of 235B parameters, achieving frontier performance with only a fraction of the usual compute.
235B params, 22B active per token
Vision-Language Integration
Qwen2.5-VL powers visual reasoning, video understanding, and structured output with native resolution—analyzing hour-long videos with precise localization.
Hour-long video comprehension
Specialized Model Variants
Purpose-built AI for coding (Qwen2.5-Coder), vision (Qwen2.5-VL), and reasoning (QwQ), tuned for domain excellence yet versatile for general tasks.
4+ specialized model families
Reinforcement Learning
QwQ-32B uses pure RL training to master complex reasoning, matching far larger closed models while keeping its reasoning transparent.
Pure RL achieves frontier-class performance
Advanced Context Processing
Up to 131K-token context via optimized attention. RoPE, SwiGLU, RMSNorm, and advanced mechanisms enable efficient long-text processing.
Up to 131K context length

Deploy on Together AI

Access Qwen models through Together's optimized inference platform with enterprise-grade security and performance guarantees.

Serverless Endpoints
Pay-per-token pricing with automatic scaling. Perfect for getting started or variable workloads.
Best for:
- Prototyping and development
- Variable or unpredictable traffic
- Cost optimization for low volume
- Getting started quickly
Qwen3-235B-A22B-FP8:
‍$0.20 input/$0.60 output

‍Qwen2.5-VL-72B:
$1.95 input/$8 output
Try Serverless
On-Demand Dedicated
Dedicated GPU capacity with guaranteed performance. No rate limits. Built for production.
Best for:
- Production applications
- Extended model library access
- Predictable latency requirements
- Enterprise SLA needs
Qwen3-32B:
‍$0.11/minute (2x H100)

‍Qwen2.5-VL-72B:
$0.22/minute (4x H100)
Deploy Endpoint
Monthly Reserved
Committed GPU capacity, enterprise features and volume discounts. Optimized for scale.
Best for:
- High-volume committed usage
- Enterprise security requirements
- Priority hardware access
- Maximum cost efficiency
Reserved GPU pricing:
Starting $0.98/hr

Volume Discounts: 
Up to 40% savings
Contact Sales

Enterprise-Grade Security

Your data and models remain fully under your control with industry-leading security standards.

SOC 2 Type II 
Comprehensive security controls audited by third parties.
HIPAA Compliant
Healthcare-grade data protection for sensitive workloads.
Model Ownership
You own your fine-tuned models and can deploy anywhere.
US-Based Infrastructure
Models hosted on secure North American servers with strict data sovereignty controls.

Real Performance Benchmarks

See how Qwen3-235B-A22B delivers SOTA performance, outperforming leading models across reasoning, math & coding.

Model	MATH-500	AIME'24	AIME'25	Live Code Bench v5	CodeForces Rating
Qwen3-235B-A22B	98.0%	85.7	81.5	70.7	2056
OpenAI o1	96.4%	74.3	79.2	63.9	1891
DeepSeek-R1	97.3%	79.8	70.0	64.3	2029
Gemini2.5-Pro	98.0%	92.0	86.7	70.4	2001

Try Qwen Models Now - Free

Experience the performance difference in Together Chat.

Launch Together Chat View API Docs

Frequently Asked Questions

What makes Qwen3's hybrid thinking different from other reasoning models?

Qwen3-235B-A22B introduces controllable thinking modes where you can switch between instant responses and step-by-step reasoning with a simple parameter. Unlike models that always use long reasoning chains, Qwen3 adapts its computational budget to the task complexity, providing optimal efficiency with MoE architecture.

How does Qwen performance compare to OpenAI o1?

Qwen3-235B-A22B outperforms OpenAI o1 on key benchmarks: 85.7 vs 74.3 on AIME'24, 98.0% vs 96.4% on MATH-500, and 2,056 vs 1,891 CodeForces rating, while using only 22B activated parameters vs o1's larger computational footprint.

What are the current pricing rates for Qwen models?

- Qwen3-235B-A22B-FP8: $0.20 input/$0.60 output.
- Qwen2.5-VL-72B: $1.95 input/$8 output.
- QwQ-32B: $1.20 per 1M tokens.
- Qwen2.5-Coder-32B: $0.80 per 1M tokens
- Qwen2.5-7B-Turbo: $0.30 per 1M tokens.

Volume discounts available for enterprise.

Can I fine-tune Qwen models on my own data?

Yes! All Qwen models are released under Apache 2.0 license, meaning you can fine-tune them on your specific use cases and own the resulting model weights. Deploy anywhere without restrictions or licensing fees.

Which Qwen model should I choose for my use case?

- For general applications: Qwen3-235B-A22B for maximum performance, Qwen2.5-7B-Turbo for efficiency.
- For coding: Qwen2.5-Coder-32B.
- For vision: Qwen2.5-VL-72B.
- For pure reasoning: QwQ-32B.
- For production: Qwen2.5-72B for balanced performance.

How do I enable hybrid thinking modes in Qwen3?

Simply add the enable_thinking=True parameter to your API calls with Qwen3-235B-A22B. The model will automatically determine when to use step-by-step reasoning vs instant responses. You can also use /think and /no_think tags in prompts for manual control.

What context lengths are supported across models?

- Qwen3-235B-A22B: 128K tokens
- Qwen2.5-7B-Turbo: 131K tokens
- QwQ-32B: 131K tokens
- Qwen2.5-Coder-32B: 128K tokens
- Qwen2.5-72B: 32K tokens.

All with optimized attention mechanisms for efficient processing.

Qwen

Get Started in Minutes

Why Qwen on Together AI?

VERIFIED BENCHMARK LEADERSHIP

HYBRID THINKING ARCHITECTURE

COMPLETE OPEN SOURCE ECOSYSTEM

Meet the Qwen Herd

Breakthrough Technical Innovations

Hybrid Thinking Architecture

Mixture of Experts Efficiency

Vision-Language Integration

Specialized Model Variants

Reinforcement Learning

Advanced Context Processing

Deploy on Together AI

Serverless Endpoints

On-Demand Dedicated

Monthly Reserved

Enterprise-Grade Security

Real Performance Benchmarks

Try Qwen Models Now - Free

Frequently Asked Questions

Subscribe to newsletter