Models / QWEN
Qwen
Think, Respond, Act
Open-source AI family that powers everything from deep reasoning to autonomous coding. Deploy cutting-edge models with thinking modes, agentic capabilities, and specialized variants for coding, math, and vision tasks.

Get Started in Minutes
Open-source drop-in for OpenAI: same API, hybrid thinking, multilingual support. Switch to Qwen instantly.
Why Qwen on Together AI?
From sub-1B to 480B+ parameters - one model family that adapts to every scale
Open-source AI family with breakthrough hybrid thinking in newest flagship models, outperforms OpenAI O3 across verified benchmarks, and offers specialized models for every domain while maintaining complete transparency and model ownership.
VERIFIED BENCHMARK LEADERSHIP
Qwen3-235B-A22B-Thinking-2507 beats OpenAI O3 across key metrics: 92 vs 88.0 on AIME'25, 83 vs 82.5 on HMMT'25, and 2,100+ vs 2,000+ CodeForces rating. Achieves frontier performance using only 22B activated parameters vs o3's larger footprint
Outperforms o3 with 90% lower computational cost
complete THINKING spectrum
Revolutionary dual-mode system that switches between instant responses, deep reasoning, and autonomous action on demand. Qwen3 models can provide lightning-fast answers for simple queries or engage step-by-step thinking for complex problems.
Think → Respond → Act in one family
COMPLETE OPEN SOURCE ECOSYSTEM
Full model weights available under Apache 2.0 license with specialized variants for every use case. Deploy anywhere - cloud, VPC, or on-premise. You own your models, your data, and your AI pipeline completely.
Complete model ownership vs closed-source dependencies
Meet the Qwen Herd
From ultralight edge models to reasoning powerhouses, choose the Qwen variant that perfectly fits your needs.

The real* Qwen capybara seen hanging out near the Golden Gate Bridge.
Breakthrough Technical Innovations
Qwen models introduce breakthrough architectural advances that redefine open-source AI capabilities.
Agentic AI Architecture
First open-source models with native agentic capabilities. Complete autonomous task execution with tool usage, multi-turn interaction, and real-world problem solving comparable to Claude Sonnet 4.
Autonomous task completion
Massive Scale MoE
Qwen3-Coder-480B-A35B represents the largest open-source coding model ever released. Advanced MoE architecture activates only 35B parameters from 480B total, delivering frontier performance at manageable computational cost.
235B params, 22B active per token
Vision-Language Integration
Qwen2.5-VL powers visual reasoning, video understanding, and structured output with native resolution—analyzing hour-long videos with precise localization.
Hour-long video comprehension
Specialized Model Variants
Purpose-built AI for coding (Qwen2.5-Coder), vision (Qwen2.5-VL), and reasoning (QwQ), tuned for domain excellence yet versatile for general tasks.
4+ specialized model families
Reinforcement Learning
QwQ-32B uses pure RL training to master complex reasoning, matching far larger closed models while keeping its reasoning transparent.
Pure RL achieves frontier-class performance
Advanced Context Processing
Up to 131K-token context via optimized attention. RoPE, SwiGLU, RMSNorm, and advanced mechanisms enable efficient long-text processing.
Up to 131K context length
Deploy on Together AI
Access Qwen models through Together's optimized inference platform with enterprise-grade security and performance guarantees.
Serverless Endpoints
Pay-per-token pricing with automatic scaling. Perfect for getting started or variable workloads.
Best for:
Prototyping and development
Variable or unpredictable traffic
Cost optimization for low volume
Getting started quickly
Qwen3-235B-A22B-FP8:
$0.20 input/$0.60 output
Qwen2.5-VL-72B:
$1.95 input/$8 outputOn-Demand Dedicated
Dedicated GPU capacity with guaranteed performance. No rate limits. Built for production.
Best for:
Production applications
Extended model library access
Predictable latency requirements
Enterprise SLA needs
Qwen3-Coder:
$0.11/minute (2x H100)
Qwen2.5-VL-72B:
$0.22/minute (4x H100)Monthly Reserved
Committed GPU capacity, enterprise features and volume discounts. Optimized for scale.
Best for:
High-volume committed usage
Enterprise security requirements
Priority hardware access
Maximum cost efficiency
Reserved GPU pricing:
Starting $0.98/hr
Volume Discounts:
Up to 40% savings
Enterprise-Grade Security
Your data and models remain fully under your control with industry-leading security standards.
SOC 2 Type II
Comprehensive security controls audited by third parties.
HIPAA Compliant
Healthcare-grade data protection for sensitive workloads.
Model Ownership
You own your fine-tuned models and can deploy anywhere.
US-Based Infrastructure
Models hosted on secure North American servers with strict data sovereignty controls.
Real Performance Benchmarks
See how Qwen3-235B-A22B delivers SOTA performance, outperforming leading models across reasoning, math & coding.
Try Qwen Models Now - Free
Experience the performance difference in Together Chat.
Frequently Asked Questions
What makes Qwen3's agentic capabilities different from other models?
Qwen3-Coder-480B-A35B is trained specifically for autonomous task completion with long-horizon RL across 20,000 parallel environments. It achieves Claude Sonnet 4 level performance on complex coding tasks and tool usage while remaining completely open-source.
How does Qwen performance compare to OpenAI O3 and O4-mini?
Qwen3-235B-A22B-Thinking-2507 outperforms OpenAI O3 on key benchmarks: 92 vs 88.0 on AIME'25, 83 vs 82.5 on HMMT'25. Our models achieve frontier performance while using significantly fewer activated parameters with MoE efficiency.
What makes Qwen3's hybrid thinking different from other reasoning models?
Qwen3-235B-A22B introduces controllable thinking modes where you can switch between instant responses and step-by-step reasoning with a simple parameter. Unlike models that always use long reasoning chains, Qwen3 adapts its computational budget to the task complexity, providing optimal efficiency with MoE architecture.
How does Qwen performance compare to OpenAI o1?
Qwen3-235B-A22B outperforms OpenAI o1 on key benchmarks: 85.7 vs 74.3 on AIME'24, 98.0% vs 96.4% on MATH-500, and 2,056 vs 1,891 CodeForces rating, while using only 22B activated parameters vs o1's larger computational footprint.
What are the current pricing rates for the new Qwen3 models?
- Qwen3-Coder-480B: $2.00 per 1M tokens.
- Qwen3-235B-Thinking: $0.65/$3.00.
- Qwen3-235B-Instruct: $0.20/$0.60.
- Qwen3-235B-A22B-FP8: $0.20 input/$0.60 output.
- Qwen2.5-VL-72B: $1.95 input/$8 output.
- QwQ-32B: $1.20 per 1M tokens.
- Qwen2.5-Coder-32B: $0.80 per 1M tokens
- Qwen2.5-7B-Turbo: $0.30 per 1M tokens.
Volume discounts available for enterprise.
Can I fine-tune Qwen models on my own data?
Yes! All Qwen models are released under Apache 2.0 license, meaning you can fine-tune them on your specific use cases and own the resulting model weights. Deploy anywhere without restrictions or licensing fees.
Which Qwen model should I choose for my use case?
- For agentic tasks: Qwen3-Coder-480B for autonomous coding and workflows.
- For deep reasoning: Qwen3-235B-Thinking.
- For fast responses: Qwen3-235B-Instruct.
- For vision: Qwen2.5-VL-72B.
- For efficiency: Qwen2.5-7B-Turbo.
How do I enable thinking vs agentic modes?
wen3-235B-Thinking automatically uses thinking mode. For agentic capabilities, use Qwen3-Coder-480B with tool access. Both models support up to 1M context length with extrapolation for complex tasks.
What context lengths are supported across models?
- Qwen3-Coder-480B: 256K tokens (1M with extrapolation)
- Qwen3-235B models: 256K tokens (1M with extrapolation)
- Qwen2.5-VL-72B: Dynamic resolution
- Qwen2.5-7B-Turbo: 131K tokens
What deployment options are available for Qwen3?
All models support Serverless pay-per-token pricing, On-Demand Dedicated reserved capacity, and VPC/On-Premise deployment. Qwen3 models excel in both cloud and edge deployment scenarios with complete model ownership.