Models / Qwen
Code
Chat
LLM

Qwen3-Coder-Next

State-of-the-art coding agent with ultra-efficient 3B active inference.

About model

Qwen3-Coder-Next is an open-weight language model designed specifically for coding agents. With only 3B activated parameters (80B total), it achieves performance comparable to models with 10–20x more active parameters, making it highly cost-effective for production agent deployment. Through an elaborate training recipe, Qwen3-Coder-Next excels at long-horizon reasoning, complex tool usage, and recovery from execution failures, ensuring robust performance in dynamic coding tasks. With 256K context length and advanced tool calling capabilities, it delivers state-of-the-art agentic coding on Together AI's production infrastructure.

SWE-Bench Verified (w/ SWE-Agent)

74.2%

Production-level autonomous coding

Activated Parameters

3B

Performing like 30-60B models

Parameter Efficiency

10-20x

Cost savings for agent workloads

Model key capabilities
  • Ultra-Efficient Architecture: Only 3B activated parameters (80B total) achieving performance comparable to models with 10-20x more active parameters—highly cost-effective for production agent deployment
  • Advanced Agentic Capabilities: Long-horizon reasoning, complex tool usage, and recovery from execution failures—ensuring robust performance in dynamic coding workflows
  • Leading Coding Performance: 74.2% SWE-Bench Verified, 63.7% SWE-Bench Multilingual, 69.9% Aider—state-of-the-art agentic coding on Together AI
  • Production-Ready Infrastructure: 256K context with 99.9% SLA on the AI Native Cloud— available on serverless and dedicated endpoints
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    Qwen/Qwen3-Coder-Next-FP8

    curl -X POST https://api.together.xyz/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -d '{
        "model": "Qwen/Qwen3-Coder-Next-FP8",
        "messages": [{
          "role": "user",
          "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
        }]
      }'
    
    from together import Together
    
    client = Together()
    response = client.chat.completions.create(
      model="Qwen/Qwen3-Coder-Next-FP8",
      messages=[
      	{
    	    "role": "user", 
          "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
        }
     ],
    )
    
    print(response.choices[0].message.content)
    
    
    import Together from "together-ai";
    
    const together = new Together();
    
    async function main() {
      const response = await together.chat.completions.create({
        model: "Qwen/Qwen3-Coder-Next-FP8",
        messages: [{
          role: "user",
          content: "Given two binary strings `a` and `b`, return their sum as a binary string"
        }]
      });
      
      console.log(response.choices[0]?.message?.content);
    }
    
    main();
    
    
  • Model card

    Architecture Overview:
    • Mixture-of-Experts (MoE) architecture with 80B total parameters and 3B activated parameters
    • 48 layers with hybrid layout: 12 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
    • Gated Attention: 16 attention heads for Q, 2 for KV, head dimension 256, rotary position embedding dimension 64
    • Gated DeltaNet: 32 linear attention heads for V, 16 for QK, head dimension 128
    • 512 experts with 10 activated per token, 1 shared expert, expert intermediate dimension 512
    • Hidden dimension 2048 with 79B non-embedding parameters
    • 256K context length (262,144 tokens natively)
    • Non-thinking mode only—does not generate thinking blocks

    Training Methodology:
    • Pretraining and post-training stages optimized for coding agents
    • Elaborate training recipe for long-horizon reasoning and complex tool usage
    • Specialized training for execution failure recovery and dynamic coding tasks
    • Trained for seamless integration with diverse development environments

    Performance Characteristics:
    • Ultra-efficient: 3B activated parameters achieving performance comparable to models with 30-60B active parameters
    • Leading agentic coding: 74.2% SWE-Bench Verified (w/ SWE-Agent), 63.7% SWE-Bench Multilingual, 44.3% SWE-Bench Pro
    • Strong autonomous coding: 69.9% Aider, 39.3% Terminal-Bench 2.0 (w/ Terminus-2 json)
    • Outperforms larger models: beats DeepSeek-V3.2 (37B active), GLM-4.7 (32B active), MiniMax M2.1 (10B active)
    • Cost-effective deployment: 10-20x parameter efficiency advantage for agent workloads
    • Advanced tool calling capabilities with native support for complex function orchestration

  • Applications & use cases

    Agentic Software Development:
    • Production-level autonomous coding: 74.2% SWE-Bench Verified, 63.7% SWE-Bench Multilingual, 44.3% SWE-Bench Pro
    • Long-horizon reasoning across complex codebases with 256K context
    • Execution failure recovery—adapts when plans don't work as expected
    • Multi-step development workflows with precision tool invocation
    • Repository-scale navigation and bug fixing
    • Code review, refactoring, and optimization tasks

    Advanced Tool Calling & Orchestration:
    • Native support for complex function calling and tool orchestration
    • Dynamic tool selection and sequential execution
    • Error handling and recovery from tool execution failures
    • Multi-tool workflows for comprehensive development tasks
    • Function definition, invocation, and result processing

    Autonomous Coding Assistance:
    • Code generation from natural language descriptions
    • Automated testing and test case generation
    • Documentation generation and code commenting
    • Debugging and error diagnosis with suggested fixes
    • 69.9% Aider performance—strong autonomous coding assistance

    Cost-Effective Agent Deployment:
    • Ultra-efficient: 3B activated parameters performing like 30-60B models
    • 10-20x parameter efficiency advantage reduces infrastructure costs
    • Highly cost-effective at $0.50/$1.20 for production agent workloads
    • Scales from prototyping to production without cost explosion
    • Ideal for startups and enterprises deploying coding agents at scale

    Development Workflow Automation:
    • End-to-end feature implementation from specification to working code
    • Automated code migration and refactoring across files
    • Batch processing of code changes across repositories
    • CI/CD pipeline integration for automated code generation
    • Technical debt reduction through automated refactoring

Related models
  • Model provider
    Qwen
  • Type
    Code
    Chat
    LLM
  • Main use cases
    Chat
    Coding Agents
  • Speed
    Very High
  • Intelligence
    High
  • Deployment
    Serverless
    Monthly Reserved
  • Parameters
    79.7B
  • Context length
    262K
  • Input price

    $0.50 / 1M tokens

  • Output price

    $1.20 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    February 1, 2026
  • Last updated
    February 2, 2026
  • Quantization level
    FP8
  • External link
  • Category
    Code