Qwen3.5 9B
Multimodal reasoning model with native tool calling on Together AI
262K
Extensible to 1M+ tokens with RoPE scaling
66.1%
Native function calling for production agents
201
Global coverage with 81.2% MMMLU
- Multimodal Reasoning: Unified text, image, and video understanding with 89.2% OCRBench, 84.5% VideoMME, and 78.9% MathVision
- Native Tool Calling: Production-ready function calling with 66.1% BFCL-V4 and 79.1% TAU2-Bench across multi-agent workflows
- Thinking Mode: Generates explicit reasoning traces before responses for improved accuracy on complex tasks
- Global & Long Context: 201 languages with 81.2% MMMLU and 262K native context extensible to 1M+ tokens
API usage
Endpoint:
Model card
Architecture Overview:
• Hybrid Gated DeltaNet and Gated Attention architecture for efficient inference with reduced latency
• 9 billion parameters optimized for multimodal understanding across text, images, and video
• 262,144 token native context window, extensible to 1M+ tokens with RoPE scaling
• Vision encoder supporting image and video inputs for cross-modal reasoning tasks
Training Methodology:
• Early fusion training on multimodal tokens achieving cross-generational performance parity
• Reinforcement learning scaled across million-agent environments for robust real-world adaptability
• Multi-token prediction (MTP) training for improved generation efficiency
• Trained with progressively complex task distributions for production-grade reliability
Performance Characteristics:
• Strong mathematical reasoning: 78.9% on MathVision, 70.1% on MMMU-Pro
• Competitive coding performance: 65.6% on LiveCodeBench v6
• Leading agent capabilities: 66.1% on BFCL-V4, 79.1% on TAU2-Bench for function calling
• Superior vision understanding: 84.5% on VideoMME, 89.2% on OCRBench
• Extensive multilingual support across 201 languages with 81.2% on MMMLU
• Long-context performance: 63.0% on AA-LCR, 55.2% on LongBench v2
Prompting
API Access:
• Access Qwen3.5 9B via Together AI APIs using the endpoint Qwen/Qwen3.5-9B
• Standard Together AI authentication with API key
• Supports text, image, and video inputs through unified chat interface
• Native tool calling with Qwen3 Coder parser for agentic workflows
Thinking Mode:
• Qwen3.5 operates in thinking mode by default, generating reasoning content before final responses
• Disable thinking mode via chat_template_kwargs: {"enable_thinking": False} for direct responses
• Recommended sampling: temperature=1.0, top_p=0.95, top_k=20, presence_penalty=1.5 for thinking mode
Note: Use presence_penalty between 0-2 to reduce repetitions. Multi-token prediction mode available for improved throughput.
Applications & use cases
Multimodal AI Applications:
• Visual question answering combining document understanding with spatial reasoning
• Video content analysis and summarization for media workflows
• OCR and document processing with 89.2% accuracy on OCRBench
• Mathematical problem solving from images with step-by-step reasoning
Agentic Workflows:
• Function calling and tool use with 66.1% accuracy on BFCL-V4
• Multi-step agent orchestration for complex task automation
• Code generation and debugging with reasoning capabilities
• Autonomous task planning across coding, cybersecurity, finance, and search domains
Production Applications:
• Long-context document analysis up to 262K tokens natively
• Multilingual chatbots and customer support across 201 languages
• RAG systems with vision-language understanding for knowledge retrieval
• Research and data analysis combining text and visual information
Enterprise Solutions:
• Medical image analysis and VQA with specialized training
• Financial document processing with OCR and reasoning
• E-commerce product cataloging from images and videos
• Educational platforms with math and science problem solving
- TypeReasoningVisionChat
- Main use casesChatVisionReasoning
- FeaturesFunction CallingJSON Mode
- SpeedHigh
- IntelligenceHigh
- DeploymentServerlessOn-Demand Dedicated
- Endpoint
- Parameters9B
- Context length262K
- Input price
$0.10 / 1M tokens
- Output price
$0.15 / 1M tokens
- Input modalitiesTextImageVideo
- Output modalitiesText
- ReleasedFebruary 23, 2026
- CategoryChat