Qwen3.5 9B

Multimodal reasoning model with native tool calling on Together AI

Try now

read docs

Native Context

262K

Extensible to 1M+ tokens with RoPE scaling

BFCL-V4

66.1%

Native function calling for production agents

Languages

201

Global coverage with 81.2% MMMLU

Model key capabilities

Multimodal Reasoning: Unified text, image, and video understanding with 89.2% OCRBench, 84.5% VideoMME, and 78.9% MathVision
Native Tool Calling: Production-ready function calling with 66.1% BFCL-V4 and 79.1% TAU2-Bench across multi-agent workflows
Thinking Mode: Generates explicit reasoning traces before responses for improved accuracy on complex tasks
Global & Long Context: 201 languages with 81.2% MMMLU and 262K native context extensible to 1M+ tokens

API usage

cURL
Python
Typescript

Endpoint:

Qwen/Qwen3.5-9B

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-9B",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="Qwen/Qwen3.5-9B",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'Qwen/Qwen3.5-9B',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• Hybrid Gated DeltaNet and Gated Attention architecture for efficient inference with reduced latency
• 9 billion parameters optimized for multimodal understanding across text, images, and video
• 262,144 token native context window, extensible to 1M+ tokens with RoPE scaling
• Vision encoder supporting image and video inputs for cross-modal reasoning tasks

Training Methodology:
• Early fusion training on multimodal tokens achieving cross-generational performance parity
• Reinforcement learning scaled across million-agent environments for robust real-world adaptability
• Multi-token prediction (MTP) training for improved generation efficiency
• Trained with progressively complex task distributions for production-grade reliability

Performance Characteristics:
• Strong mathematical reasoning: 78.9% on MathVision, 70.1% on MMMU-Pro
• Competitive coding performance: 65.6% on LiveCodeBench v6
• Leading agent capabilities: 66.1% on BFCL-V4, 79.1% on TAU2-Bench for function calling
• Superior vision understanding: 84.5% on VideoMME, 89.2% on OCRBench
• Extensive multilingual support across 201 languages with 81.2% on MMMLU
• Long-context performance: 63.0% on AA-LCR, 55.2% on LongBench v2
‍
Prompting
API Access:
• Access Qwen3.5 9B via Together AI APIs using the endpoint Qwen/Qwen3.5-9B
• Standard Together AI authentication with API key
• Supports text, image, and video inputs through unified chat interface
• Native tool calling with Qwen3 Coder parser for agentic workflows

Thinking Mode:
• Qwen3.5 operates in thinking mode by default, generating reasoning content before final responses
• Disable thinking mode via chat_template_kwargs: {"enable_thinking": False} for direct responses
• Recommended sampling: temperature=1.0, top_p=0.95, top_k=20, presence_penalty=1.5 for thinking mode

Note: Use presence_penalty between 0-2 to reduce repetitions. Multi-token prediction mode available for improved throughput.
‍
Applications & use cases
Multimodal AI Applications:
• Visual question answering combining document understanding with spatial reasoning
• Video content analysis and summarization for media workflows
• OCR and document processing with 89.2% accuracy on OCRBench
• Mathematical problem solving from images with step-by-step reasoning

Agentic Workflows:
• Function calling and tool use with 66.1% accuracy on BFCL-V4
• Multi-step agent orchestration for complex task automation
• Code generation and debugging with reasoning capabilities
• Autonomous task planning across coding, cybersecurity, finance, and search domains

Production Applications:
• Long-context document analysis up to 262K tokens natively
• Multilingual chatbots and customer support across 201 languages
• RAG systems with vision-language understanding for knowledge retrieval
• Research and data analysis combining text and visual information

Enterprise Solutions:
• Medical image analysis and VQA with specialized training
• Financial document processing with OCR and reasoning
• E-commerce product cataloging from images and videos
• Educational platforms with math and science problem solving
‍

Related models

Model specifications

Model data

Model provider
Qwen
Type
Reasoning
Vision
Chat
Main use cases
Chat
Vision
Reasoning
Features
Function Calling
JSON Mode
Speed
High
Intelligence
High
Deployment
Serverless
Dedicated
Endpoint
Qwen/Qwen3.5-9B
Parameters
9B
Context length
262K
Input price
$0.17 / 1M tokens
Output price
$0.25 / 1M tokens
Input modalities
Text
Image
Video
Output modalities
Text

Released
February 23, 2026
Category
Chat

Run in Playground

Quickstart docs

Deploy model

Qwen3.5 9B

API usage

Model card

Prompting

Applications & use cases