Qwen3.7-Plus

Multimodal agent with 1M context, vision-language understanding, and hybrid GUI and CLI control

Try now

read docs

About model

Qwen3.7-Plus is Alibaba’s cost-effective multimodal model in the Qwen3.7 series, built as a versatile agent foundation that pairs full-stack coding and productivity intelligence with a comprehensive vision-language upgrade. It can perceive real-world scenes, read screens and operate GUIs, generate code from visual references, and carry out end-to-end navigation inside mobile apps, while retaining text reasoning, tool use, and long-horizon planning. The model takes text and image input with text output, supports a 1M-token context window, and generalizes across agent frameworks rather than depending on a fixed scaffold. Available on Together AI with a 1M-token context window and function calling for agentic and multimodal workflows.

Context Window

Long-context reasoning and multi-step agent runs

Artificial Analysis Intelligence Index

Independent composite across reasoning, math, knowledge, and coding

Hybrid Agent Control

GUI + CLI

Operates graphical and command-line interfaces in a single agent

Model key capabilities

Multimodal Agent Control: Perceives real-world scenes, reads screens, and operates GUIs, generating code from visual references and navigating mobile apps end to end
Versatile Coding Agent: Full-stack software engineering and scientific programming with results across Terminal-Bench 2.0, the SWE-bench series, and SciCode
Long-Context Reasoning: A 1M-token context window for long-document analysis, multi-step planning, and tool use that generalizes across agent frameworks
Available on Together AI: Unified API access with function calling and structured output for production agent and multimodal workflows

Performance benchmarks

Model	GPQA Diamond	HLE
Qwen3.7-Plus	90.0%	33%
Related open-source models
Competitor closed-source models
Claude Fable 5	92.6%	53%
Claude Opus 5	93.2%	53%
GPT-5.6 Sol	94.1%	47%
Grok 4.5	93.1%	40%

API usage

cURL
Python
Typescript

Endpoint:

Qwen/Qwen3.7-Plus

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.7-Plus",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="Qwen/Qwen3.7-Plus",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'Qwen/Qwen3.7-Plus',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• Proprietary multimodal model in the Qwen3.7 series, positioned as the cost-effective tier
• Text and image input with text output, with a comprehensive vision-language upgrade over the prior generation
• 1M-token context window with up to 65,536 output tokens
• Multimodal interactive hybrid agent: perceives scenes, reads screens, operates GUIs, and navigates mobile apps end to end
• Generalizes across agent frameworks rather than relying on a fixed scaffold

Performance Characteristics:
• Agentic coding across Terminal-Bench 2.0, the SWE-bench series, and SciCode
• General-agent tool use and planning on MCP-Mark, Deep-Planning, and Kernel Bench L3
• Hard-STEM reasoning on GPQA Diamond, HMMT, and IMOAnswerBench
• Multimodal reasoning on MathVision, ERQA, and VisFactor, with gains on visual agent tasks including ScreenSpot Pro, OSWorld-Verified, and AndroidWorld
• Independent score of 53 on the Artificial Analysis Intelligence Index
‍
Prompting
Together AI API Access:
• Access Qwen3.7-Plus via Together AI APIs using the endpoint Qwen/Qwen3.7-Plus
• Authenticate using your Together AI API key in request headers
• Send text and image inputs and receive text output
• Supports function calling and structured output for tool-using agents
• Available on Together AI with a 1M-token context window
‍
Applications & use cases
Agentic Coding:
• Full-stack software engineering and debugging across repositories
• Scientific and GPU-kernel programming with multi-step planning
• Coding agents that generalize across frameworks

Multimodal & GUI Automation:
• Reading screens and operating graphical interfaces
• Generating code from screenshots and visual references
• End-to-end navigation within mobile apps

Long-Context Workflows:
• Long-document analysis and synthesis across a 1M-token window
• Multi-step research and planning agents

Office & Productivity:
• Document generation, data analysis, and tool-driven productivity workflows
‍

Related models

Model specifications

Model data

Model provider
Qwen
Type
Image
Vision
Reasoning
Chat
Code
Main use cases
Chat
Features
Function Calling
JSON Mode
Speed
High
Intelligence
High
Deployment
Serverless
Endpoint
Qwen/Qwen3.7-Plus
Context length
1M
Input price
$0.32 / 1M tokens
Output price
$1.28 / 1M tokens
Input modalities
Text
Image
Output modalities
Text

Released
June 1, 2026
Category
Chat

Run in Playground

Quickstart docs

Deploy model

Qwen3.7-Plus

About model

API usage

Model card

Prompting

Applications & use cases