Models / Qwen
Reasoning
Chat
Code
LLM

Qwen3.7-Max

Qwen's flagship model for the agent era with 1M context and long-horizon autonomy

About model

Qwen3.7-Max is Qwen's flagship proprietary model built for the agent era, combining frontier reasoning with deep, generalizable agentic capabilities across coding, office automation, and long-horizon task execution. It leads on Terminal-Bench 2.0-Terminus (69.7) and achieves 92.4% GPQA Diamond, 80.4% SWE-Bench Verified, and 97.1% HMMT 2026 Feb. According to Qwen, the model maintained coherent execution across a ~35-hour autonomous session, generalizing across agent scaffolds without framework-specific tuning. Available on Together AI with a 1M token context window.

Terminal-Bench 2.0-Terminus

69.7

Leads the field on agentic terminal coding and execution

GPQA Diamond

92.40%

Top scientific and mathematical reasoning

Autonomous Execution

35hrs

10.0x kernel speedup on unseen hardware

Model key capabilities
  • Agentic Coding: 80.4% SWE-Bench Verified and 78.3% SWE-Multilingual.
  • Long-Horizon Autonomy: Maintained coherent execution across a Qwen-reported ~35-hour session, demonstrating a 10.0x kernel speedup on unseen hardware.
  • General Agent Workflows: Strong MCP tool orchestration (60.8% MCP-Mark, 76.4% MCP-Atlas) and office automation including spreadsheet work at 87.0% SpreadSheetBench-v1.
  • Reasoning and Instruction Following: 92.4% GPQA Diamond and 97.1% HMMT 2026 Feb for scientific and mathematical reasoning, 79.1% IFBench for instruction following.
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

91.6

80.4

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    Qwen/Qwen3.7-Max

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Qwen/Qwen3.7-Max",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="Qwen/Qwen3.7-Max",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'Qwen/Qwen3.7-Max',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Proprietary model with 1.0M token context window
    • Text input and output; text-only

    Training Methodology:
    • Decoupled tasks, execution frameworks, and validators during training and employed cross-framework reinforcement learning to avoid shortcut overfitting to specific benchmarks

    Performance Characteristics:
    • Coding agents: 69.7 Terminal-Bench 2.0-Terminus, 80.4% SWE-Verified, 60.6% SWE-Pro, 78.3% SWE-Multilingual, 53.5% SciCode
    • General agents: 60.8% MCP-Mark, 76.4% MCP-Atlas, 87.0% SpreadSheetBench-v1, 75.0% BFCL-V4
    • Reasoning: 92.4% GPQA Diamond, 97.1% HMMT 2026 Feb, 41.4% HLE, 91.6% LiveCodeBench
    • Instruction following: 79.1% IFBench, 94.3% IFEval
    • Multilingual: 85.8% WMT24++, 90.3% MMMLU, 89.2% MAXIFE
    • Long context: 90.4% MRCR-v2 128K

  • Prompting

    Together AI API Access:
    • Access Qwen3.7-Max via Together AI APIs using the endpoint Qwen/Qwen3.7-Max
    • Authenticate using your Together AI API key in request headers
    • Available on Together AI with 1M token context

  • Applications & use cases

    Agentic Coding & Software Engineering:
    • Repository-level reasoning with 80.4% SWE-Verified and 60.6% SWE-Pro
    • Cross-language engineering with 78.3% SWE-Multilingual

    Long-Horizon Autonomous Tasks:
    • Sustained execution across multi-hour sessions, maintaining coherent strategy according to Qwen
    • Kernel optimization, hardware profiling, and iterative technical tasks on unseen platforms

    Office Automation & Productivity:
    • Document generation, data analysis, and formatting through MCP and multi-agent orchestration
    • 87.0% SpreadSheetBench-v1 for complex spreadsheet reasoning and automation

    Reasoning & Multilingual Workflows:
    • 92.4% GPQA Diamond and 97.1% HMMT 2026 Feb for scientific and mathematical problem solving
    • 79.1% IFBench for precise instruction following in complex multi-step tasks
    • 85.8% WMT24++ and 90.3% MMMLU for multilingual understanding and translation

Related models
  • Model provider
    Qwen
  • Type
    Reasoning
    Chat
    Code
    LLM
  • Main use cases
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Deployment
    Serverless
  • Context length
    1M
  • Input price

    $2.50 / 1M tokens

  • Output price

    $7.50 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    May 18, 2026
  • Category
    Chat