Qwen3.7-Max
Qwen's flagship model for the agent era with 1M context and long-horizon autonomy
About model
Qwen3.7-Max is Qwen's flagship proprietary model built for the agent era, combining frontier reasoning with deep, generalizable agentic capabilities across coding, office automation, and long-horizon task execution. It leads on Terminal-Bench 2.0-Terminus (69.7) and achieves 92.4% GPQA Diamond, 80.4% SWE-Bench Verified, and 97.1% HMMT 2026 Feb. According to Qwen, the model maintained coherent execution across a ~35-hour autonomous session, generalizing across agent scaffolds without framework-specific tuning. Available on Together AI with a 1M token context window.
69.7
Leads the field on agentic terminal coding and execution
92.40%
Top scientific and mathematical reasoning
35hrs
10.0x kernel speedup on unseen hardware
- Agentic Coding: 80.4% SWE-Bench Verified and 78.3% SWE-Multilingual.
- Long-Horizon Autonomy: Maintained coherent execution across a Qwen-reported ~35-hour session, demonstrating a 10.0x kernel speedup on unseen hardware.
- General Agent Workflows: Strong MCP tool orchestration (60.8% MCP-Mark, 76.4% MCP-Atlas) and office automation including spreadsheet work at 87.0% SpreadSheetBench-v1.
- Reasoning and Instruction Following: 92.4% GPQA Diamond and 97.1% HMMT 2026 Feb for scientific and mathematical reasoning, 79.1% IFBench for instruction following.
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
Qwen3.7-Max | 91.6 | 80.4 | Related open-source models | Competitor closed-source models | ||
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
API usage
Endpoint:
Model card
Architecture Overview:
• Proprietary model with 1.0M token context window
• Text input and output; text-only
Training Methodology:
• Decoupled tasks, execution frameworks, and validators during training and employed cross-framework reinforcement learning to avoid shortcut overfitting to specific benchmarks
Performance Characteristics:
• Coding agents: 69.7 Terminal-Bench 2.0-Terminus, 80.4% SWE-Verified, 60.6% SWE-Pro, 78.3% SWE-Multilingual, 53.5% SciCode
• General agents: 60.8% MCP-Mark, 76.4% MCP-Atlas, 87.0% SpreadSheetBench-v1, 75.0% BFCL-V4
• Reasoning: 92.4% GPQA Diamond, 97.1% HMMT 2026 Feb, 41.4% HLE, 91.6% LiveCodeBench
• Instruction following: 79.1% IFBench, 94.3% IFEval
• Multilingual: 85.8% WMT24++, 90.3% MMMLU, 89.2% MAXIFE
• Long context: 90.4% MRCR-v2 128K
Prompting
Together AI API Access:
• Access Qwen3.7-Max via Together AI APIs using the endpoint Qwen/Qwen3.7-Max
• Authenticate using your Together AI API key in request headers
• Available on Together AI with 1M token context
Applications & use cases
Agentic Coding & Software Engineering:
• Repository-level reasoning with 80.4% SWE-Verified and 60.6% SWE-Pro
• Cross-language engineering with 78.3% SWE-Multilingual
Long-Horizon Autonomous Tasks:
• Sustained execution across multi-hour sessions, maintaining coherent strategy according to Qwen
• Kernel optimization, hardware profiling, and iterative technical tasks on unseen platforms
Office Automation & Productivity:
• Document generation, data analysis, and formatting through MCP and multi-agent orchestration
• 87.0% SpreadSheetBench-v1 for complex spreadsheet reasoning and automation
Reasoning & Multilingual Workflows:
• 92.4% GPQA Diamond and 97.1% HMMT 2026 Feb for scientific and mathematical problem solving
• 79.1% IFBench for precise instruction following in complex multi-step tasks
• 85.8% WMT24++ and 90.3% MMMLU for multilingual understanding and translation
- TypeReasoningChatCodeLLM
- Main use casesReasoning
- FeaturesFunction CallingJSON Mode
- DeploymentServerless
- Endpoint
- Context length1M
- Input price
$2.50 / 1M tokens
- Output price
$7.50 / 1M tokens
- Input modalitiesText
- Output modalitiesText
- ReleasedMay 18, 2026
- CategoryChat