This website uses cookies to anonymously analyze website traffic using Google Analytics.

Models / QWEN

Qwen

Think Deeper. Act Faster.

Open-source AI family powering instant chat and deep reasoning. Deploy hybrid models with 119-language support and coding, math, vision skills.

Get Started in Minutes

Open-source drop-in for OpenAI: same API, hybrid thinking, multilingual support. Switch to Qwen instantly.


# Install the Together AI library
pip install together 

# Get started with Qwen3 
from together import Together 

client = Together() 

response = client.chat.completions.create( 
    model="Qwen/Qwen3-235B-A22B-fp8-tput",
    messages=[ 
        {
            "role": "user", 
            "content": "Solve this step by step /think"
            enable_thinking=True
        } 
    ] 
)  

print(response.choices[0].message.content)

View API Docs

Why Qwen on Together AI?

From sub-1B to 200B+ parameters - one model family that adapts to every scale

Open-source AI family with breakthrough hybrid thinking in newest flagship models, beats OpenAI o1 on benchmarks, and offers specialized variants for every domain with exceptional fine-tuning and complete model ownership.

Meet the Qwen Herd

From ultralight edge models to reasoning powerhouses, choose the Qwen variant that perfectly fits your needs.

Qwen3-235B-A22B-FP8

Flagship hybrid reasoning model

  • 22B

    Active Params

  • 235B

    Total Params

  • 128K

    Context

  • 85.7

    AIME'24 Score

Key Strengths:

  • Beats OpenAI o1 on reasoning benchmarks

  • MoE efficiency delivers frontier performance

  • Hybrid thinking modes

Qwen2.5-VL-72B

Vision-language powerhouse

  • 72B

    Parameters

  • Vision

    Capable

  • Dynamic

    Resolution

  • Hour-long

    Video Support

Key Strengths:

  • Advanced visual reasoning and video understanding

  • Structured output generation

  • Agentic capabilities

QwQ-32B

Specialized reasoning model

  • 32B

    Parameters

  • 131K

    Context

  • RL

    Trained

  • 98.0%

    MATH Score

Key Strengths:

  • Excels in complex reasoning tasks

  • Pure RL methodology

  • Step-by-step transparent thinking

Qwen2.5-Coder-32B

SOTA open-source coding model

  • 32B

    Parameters

  • 128K

    Context

  • 92.7

    HumanEval

  • 90.2

    MBPP

Key Strengths:

  • Advanced code generation and reasoning

  • Code fixing capabilities

  • Matches GPT-4o coding performance

Qwen2.5-72B

High-performance dense model

  • 72B

    Parameters

  • 32K

    Context

  • FP8

    Quantization

  • Dense

    Architecture

Key Strengths:

  • Advanced language processing

  • Production-ready

  • Balanced performance and cost

Qwen2.5-7B-Turbo

Ultra-efficient model

  • 7.61B

    Parameters

  • 131K

    Context

  • FP8

    Quantization

  • Turbo

    Speed

Key Strengths:

  • Ultra-efficient inference

  • Edge deployment ready

  • Cost-effective for high volume

The real* Qwen capybara seen hanging out near the Golden Gate Bridge.

Breakthrough Technical Innovations

Qwen models introduce breakthrough architectural advances that redefine open-source AI capabilities.

  • Hybrid Thinking Architecture

    First open-source models with controllable thinking modes: toggle between instant answers & step-by-step reasoning via one parameter—no external CoT prompting.

    Scalable thinking budget control

  • Mixture of Experts Efficiency

    Qwen3-235B-A22B’s advanced MoE activates only 22B of 235B parameters, achieving frontier performance with only a fraction of the usual compute.

    235B params, 22B active per token

  • Vision-Language Integration

    Qwen2.5-VL powers visual reasoning, video understanding, and structured output with native resolution—analyzing hour-long videos with precise localization.

    Hour-long video comprehension

  • Specialized Model Variants

    Purpose-built AI for coding (Qwen2.5-Coder), vision (Qwen2.5-VL), and reasoning (QwQ), tuned for domain excellence yet versatile for general tasks.

    4+ specialized model families

  • Reinforcement Learning

    QwQ-32B uses pure RL training to master complex reasoning, matching far larger closed models while keeping its reasoning transparent.

    Pure RL achieves frontier-class performance

  • Advanced Context Processing

    Up to 131K-token context via optimized attention. RoPE, SwiGLU, RMSNorm, and advanced mechanisms enable efficient long-text processing.

    Up to 131K context length

Deploy on Together AI

Access Qwen models through Together's optimized inference platform with enterprise-grade security and performance guarantees.

  • Serverless Endpoints

    Pay-per-token pricing with automatic scaling. Perfect for getting started or variable workloads.

    Best for:

    • Prototyping and development

    • Variable or unpredictable traffic

    • Cost optimization for low volume

    • Getting started quickly

    Qwen3-235B-A22B-FP8:
    $0.20 input/$0.60 output

    Qwen2.5-VL-72B:
    $1.95 input/$8 output

  • On-Demand Dedicated

    Dedicated GPU capacity with guaranteed performance. No rate limits. Built for production.

    Best for:

    • Production applications

    • Extended model library access

    • Predictable latency requirements

    • Enterprise SLA needs

    Qwen3-32B:
    $0.11/minute (2x H100)

    Qwen2.5-VL-72B:
    $0.22/minute (4x H100)

  • Monthly Reserved

    Committed GPU capacity, enterprise features and volume discounts. Optimized for scale.

    Best for:

    • High-volume committed usage

    • Enterprise security requirements

    • Priority hardware access

    • Maximum cost efficiency

    Reserved GPU pricing:
    Starting $0.98/hr

    Volume Discounts:

    Up to 40% savings

Enterprise-Grade Security

Your data and models remain fully under your control with industry-leading security standards.

  • SOC 2 Type II


    Comprehensive security controls audited by third parties.

  • HIPAA Compliant

    Healthcare-grade data protection for sensitive workloads.

  • Model Ownership

    You own your fine-tuned models and can deploy anywhere.

  • US-Based Infrastructure

    Models hosted on secure North American servers with strict data sovereignty controls.

Real Performance Benchmarks

See how Qwen3-235B-A22B delivers SOTA performance, outperforming leading models across reasoning, math & coding.

Model

MATH-500

AIME'24

AIME'25

Live Code Bench v5

CodeForces Rating

Qwen3-235B-A22B

98.0%

85.7

81.5

70.7

2056

OpenAI o1

96.4%

74.3

79.2

63.9

1891

DeepSeek-R1

97.3%

79.8

70.0

64.3

2029

Gemini2.5-Pro

98.0%

92.0

86.7

70.4

2001

Try Qwen Models Now - Free

Experience the performance difference in Together Chat.

Frequently Asked Questions

What makes Qwen3's hybrid thinking different from other reasoning models?

Qwen3-235B-A22B introduces controllable thinking modes where you can switch between instant responses and step-by-step reasoning with a simple parameter. Unlike models that always use long reasoning chains, Qwen3 adapts its computational budget to the task complexity, providing optimal efficiency with MoE architecture.

How does Qwen performance compare to OpenAI o1?

Qwen3-235B-A22B outperforms OpenAI o1 on key benchmarks: 85.7 vs 74.3 on AIME'24, 98.0% vs 96.4% on MATH-500, and 2,056 vs 1,891 CodeForces rating, while using only 22B activated parameters vs o1's larger computational footprint.

What are the current pricing rates for Qwen models?

- Qwen3-235B-A22B-FP8: $0.20 input/$0.60 output.
- Qwen2.5-VL-72B: $1.95 input/$8 output.
- QwQ-32B: $1.20 per 1M tokens.
- Qwen2.5-Coder-32B: $0.80 per 1M tokens
- Qwen2.5-7B-Turbo: $0.30 per 1M tokens.

Volume discounts available for enterprise.

Can I fine-tune Qwen models on my own data?

Yes! All Qwen models are released under Apache 2.0 license, meaning you can fine-tune them on your specific use cases and own the resulting model weights. Deploy anywhere without restrictions or licensing fees.

Which Qwen model should I choose for my use case?

- For general applications: Qwen3-235B-A22B for maximum performance, Qwen2.5-7B-Turbo for efficiency.
- For coding: Qwen2.5-Coder-32B.
- For vision: Qwen2.5-VL-72B.
- For pure reasoning: QwQ-32B.
- For production: Qwen2.5-72B for balanced performance.

How do I enable hybrid thinking modes in Qwen3?

Simply add the enable_thinking=True parameter to your API calls with Qwen3-235B-A22B. The model will automatically determine when to use step-by-step reasoning vs instant responses. You can also use /think and /no_think tags in prompts for manual control.

What context lengths are supported across models?

- Qwen3-235B-A22B: 128K tokens
- Qwen2.5-7B-Turbo: 131K tokens
- QwQ-32B: 131K tokens
- Qwen2.5-Coder-32B: 128K tokens
- Qwen2.5-72B: 32K tokens.

All with optimized attention mechanisms for efficient processing.