Models / Qwen
Chat
LLM

Qwen3-Next-80B-A3B-Instruct

Next-generation instruction model with extreme efficiency

About model

Instruction-Optimized Efficiency:
Qwen3-Next Instruct features a highly sparse MoE structure that activates only 3B of its 80B parameters during inference. Supports only instruct mode without thinking blocks, delivering performance on par with Qwen3-235B-A22B-Instruct-2507 on certain benchmarks while using less than 10% training cost and providing 10x+ higher throughput on contexts over 32K tokens.

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    Qwen/Qwen3-Next-80B-A3B-Instruct

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Qwen/Qwen3-Next-80B-A3B-Instruct",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="Qwen/Qwen3-Next-80B-A3B-Instruct",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'Qwen/Qwen3-Next-80B-A3B-Instruct',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • 48 layers with 2048 hidden dimension and hybrid layout pattern
    • 512 total experts with 10 activated and 1 shared expert per MoE layer
    • Multi-token prediction mechanism for faster inference

    Instruction Optimization:
    • Supports only instruct mode without blocks
    • Specialized post-training for task completion on 15T tokens
    • Performance on par with Qwen3-235B while using significantly fewer resources

    Performance Characteristics:
    • 262K native context length, extensible to 1M tokens with YaRN scaling
    • More than 10x higher throughput on contexts over 32K tokens
    • SGLang and vLLM deployment support with Multi-Token Prediction

  • Applications & use cases

    Task Automation:
    • Code generation and software development assistance with cost-effective processing
    • Content creation and editing with specific instructions
    • Data analysis and report generation following detailed guidelines

    Business Applications:
    • Customer service automation with instruction-based responses
    • Technical documentation generation with specific formatting requirements
    • Process automation and workflow optimization

    Agentic Use Cases:
    • Tool calling capabilities with MCP configuration support
    • Multi-step task execution with built-in and custom tools
    • Extended conversation and context-aware task completion up to 262K tokens

Related models
  • Model provider
    Qwen
  • Type
    Chat
    LLM
  • Main use cases
    Chat
    Function Calling
  • Features
    Function Calling
  • Speed
    Very High
  • Intelligence
    Medium
  • Deployment
    Serverless
    On-Demand Dedicated
    Monthly Reserved
  • Parameters
    81.3B
  • Context length
    256K
  • Input price

    $0.15 / 1M tokens

  • Output price

    $1.50 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    September 9, 2025
  • Last updated
    September 12, 2025
  • Quantization level
    BF16
  • External link
  • Category
    Chat