Models / Qwen
Image
Vision
Reasoning
Chat
Code

Qwen3.7-Plus

Multimodal agent with 1M context, vision-language understanding, and hybrid GUI and CLI control

About model

Qwen3.7-Plus is Alibaba’s cost-effective multimodal model in the Qwen3.7 series, built as a versatile agent foundation that pairs full-stack coding and productivity intelligence with a comprehensive vision-language upgrade. It can perceive real-world scenes, read screens and operate GUIs, generate code from visual references, and carry out end-to-end navigation inside mobile apps, while retaining text reasoning, tool use, and long-horizon planning. The model takes text and image input with text output, supports a 1M-token context window, and generalizes across agent frameworks rather than depending on a fixed scaffold. Available on Together AI with a 1M-token context window and function calling for agentic and multimodal workflows.

Context Window

1M

Long-context reasoning and multi-step agent runs

Artificial Analysis Intelligence Index

53

Independent composite across reasoning, math, knowledge, and coding

Hybrid Agent Control

GUI + CLI

Operates graphical and command-line interfaces in a single agent

Model key capabilities
  • Multimodal Agent Control: Perceives real-world scenes, reads screens, and operates GUIs, generating code from visual references and navigating mobile apps end to end
  • Versatile Coding Agent: Full-stack software engineering and scientific programming with results across Terminal-Bench 2.0, the SWE-bench series, and SciCode
  • Long-Context Reasoning: A 1M-token context window for long-document analysis, multi-step planning, and tool use that generalizes across agent frameworks
  • Available on Together AI: Unified API access with function calling and structured output for production agent and multimodal workflows
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    Qwen/Qwen3.7-Plus

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Qwen/Qwen3.7-Plus",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="Qwen/Qwen3.7-Plus",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'Qwen/Qwen3.7-Plus',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Proprietary multimodal model in the Qwen3.7 series, positioned as the cost-effective tier
    • Text and image input with text output, with a comprehensive vision-language upgrade over the prior generation
    • 1M-token context window with up to 65,536 output tokens
    • Multimodal interactive hybrid agent: perceives scenes, reads screens, operates GUIs, and navigates mobile apps end to end
    • Generalizes across agent frameworks rather than relying on a fixed scaffold

    Performance Characteristics:
    • Agentic coding across Terminal-Bench 2.0, the SWE-bench series, and SciCode
    • General-agent tool use and planning on MCP-Mark, Deep-Planning, and Kernel Bench L3
    • Hard-STEM reasoning on GPQA Diamond, HMMT, and IMOAnswerBench
    • Multimodal reasoning on MathVision, ERQA, and VisFactor, with gains on visual agent tasks including ScreenSpot Pro, OSWorld-Verified, and AndroidWorld
    • Independent score of 53 on the Artificial Analysis Intelligence Index

  • Prompting

    Together AI API Access:
    • Access Qwen3.7-Plus via Together AI APIs using the endpoint Qwen/Qwen3.7-Plus
    • Authenticate using your Together AI API key in request headers
    • Send text and image inputs and receive text output
    • Supports function calling and structured output for tool-using agents
    • Available on Together AI with a 1M-token context window

  • Applications & use cases

    Agentic Coding:
    • Full-stack software engineering and debugging across repositories
    • Scientific and GPU-kernel programming with multi-step planning
    • Coding agents that generalize across frameworks

    Multimodal & GUI Automation:
    • Reading screens and operating graphical interfaces
    • Generating code from screenshots and visual references
    • End-to-end navigation within mobile apps

    Long-Context Workflows:
    • Long-document analysis and synthesis across a 1M-token window
    • Multi-step research and planning agents

    Office & Productivity:
    • Document generation, data analysis, and tool-driven productivity workflows

Related models
  • Model provider
    Qwen
  • Type
    Image
    Vision
    Reasoning
    Chat
    Code
  • Main use cases
    Chat
  • Features
    Function Calling
    JSON Mode
  • Speed
    High
  • Intelligence
    High
  • Deployment
    Serverless
  • Context length
    1M
  • Input price

    $0.32 / 1M tokens

  • Output price

    $1.28 / 1M tokens

  • Input modalities
    Text
    Image
  • Output modalities
    Text
  • Released
    June 1, 2026
  • Category
    Chat