Qwen3.7-Plus
Multimodal agent with 1M context, vision-language understanding, and hybrid GUI and CLI control
About model
Qwen3.7-Plus is Alibaba’s cost-effective multimodal model in the Qwen3.7 series, built as a versatile agent foundation that pairs full-stack coding and productivity intelligence with a comprehensive vision-language upgrade. It can perceive real-world scenes, read screens and operate GUIs, generate code from visual references, and carry out end-to-end navigation inside mobile apps, while retaining text reasoning, tool use, and long-horizon planning. The model takes text and image input with text output, supports a 1M-token context window, and generalizes across agent frameworks rather than depending on a fixed scaffold. Available on Together AI with a 1M-token context window and function calling for agentic and multimodal workflows.
1M
Long-context reasoning and multi-step agent runs
53
Independent composite across reasoning, math, knowledge, and coding
GUI + CLI
Operates graphical and command-line interfaces in a single agent
- Multimodal Agent Control: Perceives real-world scenes, reads screens, and operates GUIs, generating code from visual references and navigating mobile apps end to end
- Versatile Coding Agent: Full-stack software engineering and scientific programming with results across Terminal-Bench 2.0, the SWE-bench series, and SciCode
- Long-Context Reasoning: A 1M-token context window for long-document analysis, multi-step planning, and tool use that generalizes across agent frameworks
- Available on Together AI: Unified API access with function calling and structured output for production agent and multimodal workflows
API usage
Endpoint:
Model card
Architecture Overview:
• Proprietary multimodal model in the Qwen3.7 series, positioned as the cost-effective tier
• Text and image input with text output, with a comprehensive vision-language upgrade over the prior generation
• 1M-token context window with up to 65,536 output tokens
• Multimodal interactive hybrid agent: perceives scenes, reads screens, operates GUIs, and navigates mobile apps end to end
• Generalizes across agent frameworks rather than relying on a fixed scaffold
Performance Characteristics:
• Agentic coding across Terminal-Bench 2.0, the SWE-bench series, and SciCode
• General-agent tool use and planning on MCP-Mark, Deep-Planning, and Kernel Bench L3
• Hard-STEM reasoning on GPQA Diamond, HMMT, and IMOAnswerBench
• Multimodal reasoning on MathVision, ERQA, and VisFactor, with gains on visual agent tasks including ScreenSpot Pro, OSWorld-Verified, and AndroidWorld
• Independent score of 53 on the Artificial Analysis Intelligence Index
Prompting
Together AI API Access:
• Access Qwen3.7-Plus via Together AI APIs using the endpoint Qwen/Qwen3.7-Plus
• Authenticate using your Together AI API key in request headers
• Send text and image inputs and receive text output
• Supports function calling and structured output for tool-using agents
• Available on Together AI with a 1M-token context window
Applications & use cases
Agentic Coding:
• Full-stack software engineering and debugging across repositories
• Scientific and GPU-kernel programming with multi-step planning
• Coding agents that generalize across frameworks
Multimodal & GUI Automation:
• Reading screens and operating graphical interfaces
• Generating code from screenshots and visual references
• End-to-end navigation within mobile apps
Long-Context Workflows:
• Long-document analysis and synthesis across a 1M-token window
• Multi-step research and planning agents
Office & Productivity:
• Document generation, data analysis, and tool-driven productivity workflows
- TypeImageVisionReasoningChatCode
- Main use casesChat
- FeaturesFunction CallingJSON Mode
- SpeedHigh
- IntelligenceHigh
- DeploymentServerless
- Endpoint
- Context length1M
- Input price
$0.32 / 1M tokens
- Output price
$1.28 / 1M tokens
- Input modalitiesTextImage
- Output modalitiesText
- ReleasedJune 1, 2026
- CategoryChat