Models / Qwen / / Qwen3.5-397B-A17B API
Qwen3.5-397B-A17B API

This model isn’t available on Together’s Serverless API.
Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.
Qwen3.5 is a native vision-language foundation model with 397B total parameters (17B activated) using sparse MoE architecture. Through early fusion training on multimodal tokens, Qwen3.5 achieves cross-generational parity with Qwen3 while outperforming Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks. The model features Gated Delta Networks combined with sparse Mixture-of-Experts, delivering 8.6x-19x faster decoding than Qwen3-Max with minimal latency overhead. Reinforcement learning scaled across million-agent environments enables robust real-world adaptability, while expanded support to 201 languages and dialects enables inclusive worldwide deployment on Together AI's production infrastructure.
Key Capabilities:
Qwen3.5-397B-A17B API Usage
Endpoint
How to use Qwen3.5-397B-A17B
Model details
Architecture Overview:
• Native vision-language foundation model with early fusion training on multimodal tokens
• 397B total parameters with 17B activated per forward pass via sparse MoE routing
• 512 experts with 10 routed + 1 shared expert activated per token
• Hybrid architecture: Gated Delta Networks + sparse Mixture-of-Experts
• 60 layers with 15 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) layout
• Gated DeltaNet: 64 linear attention heads for V, 16 for QK, head dimension 128
• Gated Attention: 32 heads for Q, 2 for KV, head dimension 256, RoPE dimension 64
• 248,320 token vocabulary (padded) with 15-25% lower token counts on technical datasets
• 256K native context (262,144 tokens), extensible to 1M tokens via YaRN scaling
• High-resolution vision: up to 1344x1344 pixels, UI screenshots with pixel-perfect element detection
Training Methodology:
• Early fusion multimodal training achieving near-100% efficiency vs text-only training
• Reinforcement learning scaled across million-agent environments with progressively complex task distributions
• Asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration
• Trained for robust real-world adaptability across reasoning, coding, agents, and visual understanding
• 201 languages and dialects with nuanced cultural and regional understanding
• Multi-token prediction (MTP) training for enhanced inference efficiency
Performance Characteristics:
• Reasoning Excellence: 87.8% MMLU-Pro, 91.3% AIME26, 83.6% LiveCodeBench v6, 88.4% GPQA
• Coding Leadership: 76.4% SWE-Bench Verified, 69.3% SWE-Bench Multilingual, 68.3% SecCodeBench
• Agentic Performance: 69.0/78.6% BrowseComp, 74.0% WideSearch, 72.9% BFCL-V4, 86.7% TAU2-Bench
• Multilingual SOTA: 88.5% MMMLU, 84.7% MMLU-ProX (29 languages), 78.9% WMT24++ (55 languages)
• Vision Language: 88.6% MathVision, 90.3% MathVista, 85.0% MMMU, 79.0% MMMU-Pro
• Video Understanding: 87.5% VideoMME (w/ sub), 84.7% VideoMMMU, 86.7% MLVU
• Document Understanding: 90.8% OmniDocBench1.5, 93.1% OCRBench, 82.0% CC-OCR
• Efficiency: 8.6x-19x faster decoding vs Qwen3-Max at 32k-256k context lengths
Prompting Qwen3.5-397B-A17B
Applications & Use Cases
Native Multimodal Reasoning:
• Vision-language foundation with early fusion training across text, image, and video
• 87.8% MMLU-Pro, 91.3% AIME26, 83.6% LiveCodeBench v6, 88.4% GPQA Diamond
• Math vision: 88.6% MathVision, 90.3% MathVista (mini), 87.9% We-Math
• High-resolution image understanding: up to 1344x1344 pixels, UI element detection
• Video understanding: 87.5% VideoMME (w/ subtitles), 84.7% VideoMMMU, 86.7% MLVU
• Document processing: 90.8% OmniDocBench1.5, 93.1% OCRBench, 82.0% CC-OCR
SOTA Coding & Agentic Performance:
• Autonomous coding: 76.4% SWE-Bench Verified, 69.3% SWE-Bench Multilingual, 68.3% SecCodeBench
• Agentic workflows: 69.0/78.6% BrowseComp, 74.0% WideSearch, 86.7% TAU2-Bench
• Tool orchestration: 72.9% BFCL-V4, 38.3% Tool Decathlon, 46.1% MCP-Mark
• Visual agents: 65.6% ScreenSpot Pro, 62.2% OSWorld-Verified, 66.8% AndroidWorld
• Search with tools: 48.3% HLE w/ tool, 70.3% BrowseComp-zh
Global Multilingual Deployment:
• 201 languages and dialects with nuanced cultural and regional understanding
• Multilingual excellence: 88.5% MMMLU, 84.7% MMLU-ProX (29 languages)
• Translation quality: 78.9% WMT24++ across 55 languages using XCOMET-XXL
• Cross-lingual reasoning: 59.1% NOVA-63, 85.6% INCLUDE, 89.8% Global PIQA
• Multilingual math: 73.3% PolyMATH, 88.2% MAXIFE across 23 settings
• Instruction following: 76.5% IFBench, 67.6% MultiChallenge across languages
Production Efficiency at Scale:
• 8.6x-19x faster decoding than Qwen3-Max at 32k-256k context lengths
• Hybrid Gated Delta Networks architecture with minimal latency overhead
• 397B total parameters with only 17B activated per token via sparse MoE
• 256K native context (262,144 tokens), extensible to 1M via YaRN RoPE scaling
• Multi-token prediction (MTP) for enhanced inference throughput
• 248,320 token vocabulary reducing token counts 15-25% on technical datasets
Visual Agentic Workflows:
• Desktop screenshot understanding with UI element identification and workflow planning
• Pixel-perfect element detection for UI automation and testing
• Executable action generation for autonomous task completion
• Native tool calling for web search, code execution, and API orchestration
• Visual agents: 65.6% ScreenSpot Pro, 62.2% OSWorld-Verified, 66.8% AndroidWorld
Long-Context Knowledge Work:
• 256K native context supporting entire codebases, video transcripts, technical reports
• Extensible to 1M tokens for processing hour-scale videos and massive documents
• Long-context benchmarks: 63.2% LongBench v2, 68.7% AA-LCR
• Document understanding: 90.8% OmniDocBench1.5 with long-form comprehension
• Eliminates chunking for large-scale technical documentation and research
Thinking Mode for Complex Reasoning:
• Default thinking mode generates step-by-step reasoning before final responses
• Enhanced performance on math competitions (91.3% AIME26), coding challenges (83.6% LiveCodeBench)
• Disableable for direct responses in conversational or low-latency applications
• Configurable via API parameters without model changes
• Optimal for complex problem-solving, mathematical proofs, algorithm design
