Models / QWEN
Qwen
Think Deeper. Act Faster.
Open-source AI family powering instant chat and deep reasoning. Deploy hybrid models with 119-language support and coding, math, vision skills.

Get Started in Minutes
Open-source drop-in for OpenAI: same API, hybrid thinking, multilingual support. Switch to Qwen instantly.
Why Qwen on Together AI?
From sub-1B to 200B+ parameters - one model family that adapts to every scale
Open-source AI family with breakthrough hybrid thinking in newest flagship models, beats OpenAI o1 on benchmarks, and offers specialized variants for every domain with exceptional fine-tuning and complete model ownership.
VERIFIED BENCHMARK LEADERSHIP
Qwen3-235B-A22B beats OpenAI o1 across key metrics: 85.7 vs 74.3 on AIME'24, 98.0% vs 96.4% on MATH-500, and 2,056 vs 1,891 CodeForces rating. Achieves frontier performance using only 22B activated parameters vs o1's larger footprint.
Outperforms o1 with 90% lower computational cost
HYBRID THINKING ARCHITECTURE
Revolutionary dual-mode system that switches between instant responses and deep reasoning on demand. Qwen3 models can provide lightning-fast answers for simple queries or engage step-by-step thinking for complex problems - all controlled by a simple parameter.
Single model replaces multiple specialized models
COMPLETE OPEN SOURCE ECOSYSTEM
Full model weights available under Apache 2.0 license with specialized variants for every use case. Deploy anywhere - cloud, VPC, or on-premise. You own your models, your data, and your AI pipeline completely.
Complete model ownership vs closed-source dependencies
Meet the Qwen Herd
From ultralight edge models to reasoning powerhouses, choose the Qwen variant that perfectly fits your needs.

The real* Qwen capybara seen hanging out near the Golden Gate Bridge.
Breakthrough Technical Innovations
Qwen models introduce breakthrough architectural advances that redefine open-source AI capabilities.
Hybrid Thinking Architecture
First open-source models with controllable thinking modes: toggle between instant answers & step-by-step reasoning via one parameter—no external CoT prompting.
Scalable thinking budget control
Mixture of Experts Efficiency
Qwen3-235B-A22B’s advanced MoE activates only 22B of 235B parameters, achieving frontier performance with only a fraction of the usual compute.
235B params, 22B active per token
Vision-Language Integration
Qwen2.5-VL powers visual reasoning, video understanding, and structured output with native resolution—analyzing hour-long videos with precise localization.
Hour-long video comprehension
Specialized Model Variants
Purpose-built AI for coding (Qwen2.5-Coder), vision (Qwen2.5-VL), and reasoning (QwQ), tuned for domain excellence yet versatile for general tasks.
4+ specialized model families
Reinforcement Learning
QwQ-32B uses pure RL training to master complex reasoning, matching far larger closed models while keeping its reasoning transparent.
Pure RL achieves frontier-class performance
Advanced Context Processing
Up to 131K-token context via optimized attention. RoPE, SwiGLU, RMSNorm, and advanced mechanisms enable efficient long-text processing.
Up to 131K context length
Deploy on Together AI
Access Qwen models through Together's optimized inference platform with enterprise-grade security and performance guarantees.
Serverless Endpoints
Pay-per-token pricing with automatic scaling. Perfect for getting started or variable workloads.
Best for:
Prototyping and development
Variable or unpredictable traffic
Cost optimization for low volume
Getting started quickly
Qwen3-235B-A22B-FP8:
$0.20 input/$0.60 output
Qwen2.5-VL-72B:
$1.95 input/$8 outputOn-Demand Dedicated
Dedicated GPU capacity with guaranteed performance. No rate limits. Built for production.
Best for:
Production applications
Extended model library access
Predictable latency requirements
Enterprise SLA needs
Qwen3-32B:
$0.11/minute (2x H100)
Qwen2.5-VL-72B:
$0.22/minute (4x H100)Monthly Reserved
Committed GPU capacity, enterprise features and volume discounts. Optimized for scale.
Best for:
High-volume committed usage
Enterprise security requirements
Priority hardware access
Maximum cost efficiency
Reserved GPU pricing:
Starting $0.98/hr
Volume Discounts:
Up to 40% savings
Enterprise-Grade Security
Your data and models remain fully under your control with industry-leading security standards.
SOC 2 Type II
Comprehensive security controls audited by third parties.
HIPAA Compliant
Healthcare-grade data protection for sensitive workloads.
Model Ownership
You own your fine-tuned models and can deploy anywhere.
US-Based Infrastructure
Models hosted on secure North American servers with strict data sovereignty controls.
Real Performance Benchmarks
See how Qwen3-235B-A22B delivers SOTA performance, outperforming leading models across reasoning, math & coding.
Try Qwen Models Now - Free
Experience the performance difference in Together Chat.
Frequently Asked Questions
What makes Qwen3's hybrid thinking different from other reasoning models?
Qwen3-235B-A22B introduces controllable thinking modes where you can switch between instant responses and step-by-step reasoning with a simple parameter. Unlike models that always use long reasoning chains, Qwen3 adapts its computational budget to the task complexity, providing optimal efficiency with MoE architecture.
How does Qwen performance compare to OpenAI o1?
Qwen3-235B-A22B outperforms OpenAI o1 on key benchmarks: 85.7 vs 74.3 on AIME'24, 98.0% vs 96.4% on MATH-500, and 2,056 vs 1,891 CodeForces rating, while using only 22B activated parameters vs o1's larger computational footprint.
What are the current pricing rates for Qwen models?
- Qwen3-235B-A22B-FP8: $0.20 input/$0.60 output.
- Qwen2.5-VL-72B: $1.95 input/$8 output.
- QwQ-32B: $1.20 per 1M tokens.
- Qwen2.5-Coder-32B: $0.80 per 1M tokens
- Qwen2.5-7B-Turbo: $0.30 per 1M tokens.
Volume discounts available for enterprise.
Can I fine-tune Qwen models on my own data?
Yes! All Qwen models are released under Apache 2.0 license, meaning you can fine-tune them on your specific use cases and own the resulting model weights. Deploy anywhere without restrictions or licensing fees.
Which Qwen model should I choose for my use case?
- For general applications: Qwen3-235B-A22B for maximum performance, Qwen2.5-7B-Turbo for efficiency.
- For coding: Qwen2.5-Coder-32B.
- For vision: Qwen2.5-VL-72B.
- For pure reasoning: QwQ-32B.
- For production: Qwen2.5-72B for balanced performance.
How do I enable hybrid thinking modes in Qwen3?
Simply add the enable_thinking=True parameter to your API calls with Qwen3-235B-A22B. The model will automatically determine when to use step-by-step reasoning vs instant responses. You can also use /think and /no_think tags in prompts for manual control.
What context lengths are supported across models?
- Qwen3-235B-A22B: 128K tokens
- Qwen2.5-7B-Turbo: 131K tokens
- QwQ-32B: 131K tokens
- Qwen2.5-Coder-32B: 128K tokens
- Qwen2.5-72B: 32K tokens.
All with optimized attention mechanisms for efficient processing.