Models / Qwen
Reasoning
Vision
Chat

Qwen3.5 9B

Multimodal reasoning model with native tool calling on Together AI

Native Context

262K

Extensible to 1M+ tokens with RoPE scaling

BFCL-V4

66.1%

Native function calling for production agents

Languages

201

Global coverage with 81.2% MMMLU

Model key capabilities
  • Multimodal Reasoning: Unified text, image, and video understanding with 89.2% OCRBench, 84.5% VideoMME, and 78.9% MathVision
  • Native Tool Calling: Production-ready function calling with 66.1% BFCL-V4 and 79.1% TAU2-Bench across multi-agent workflows
  • Thinking Mode: Generates explicit reasoning traces before responses for improved accuracy on complex tasks
  • Global & Long Context: 201 languages with 81.2% MMMLU and 262K native context extensible to 1M+ tokens
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    Qwen/Qwen3.5-9B

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Qwen/Qwen3.5-9B",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="Qwen/Qwen3.5-9B",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'Qwen/Qwen3.5-9B',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Hybrid Gated DeltaNet and Gated Attention architecture for efficient inference with reduced latency
    • 9 billion parameters optimized for multimodal understanding across text, images, and video
    • 262,144 token native context window, extensible to 1M+ tokens with RoPE scaling
    • Vision encoder supporting image and video inputs for cross-modal reasoning tasks

    Training Methodology:
    • Early fusion training on multimodal tokens achieving cross-generational performance parity
    • Reinforcement learning scaled across million-agent environments for robust real-world adaptability
    • Multi-token prediction (MTP) training for improved generation efficiency
    • Trained with progressively complex task distributions for production-grade reliability

    Performance Characteristics:
    • Strong mathematical reasoning: 78.9% on MathVision, 70.1% on MMMU-Pro
    • Competitive coding performance: 65.6% on LiveCodeBench v6
    • Leading agent capabilities: 66.1% on BFCL-V4, 79.1% on TAU2-Bench for function calling
    • Superior vision understanding: 84.5% on VideoMME, 89.2% on OCRBench
    • Extensive multilingual support across 201 languages with 81.2% on MMMLU
    • Long-context performance: 63.0% on AA-LCR, 55.2% on LongBench v2

  • Prompting

    API Access:
    • Access Qwen3.5 9B via Together AI APIs using the endpoint Qwen/Qwen3.5-9B
    • Standard Together AI authentication with API key
    • Supports text, image, and video inputs through unified chat interface
    • Native tool calling with Qwen3 Coder parser for agentic workflows

    Thinking Mode:
    • Qwen3.5 operates in thinking mode by default, generating reasoning content before final responses
    • Disable thinking mode via chat_template_kwargs: {"enable_thinking": False} for direct responses
    • Recommended sampling: temperature=1.0, top_p=0.95, top_k=20, presence_penalty=1.5 for thinking mode

    Note: Use presence_penalty between 0-2 to reduce repetitions. Multi-token prediction mode available for improved throughput.

  • Applications & use cases

    Multimodal AI Applications:
    • Visual question answering combining document understanding with spatial reasoning
    • Video content analysis and summarization for media workflows
    • OCR and document processing with 89.2% accuracy on OCRBench
    • Mathematical problem solving from images with step-by-step reasoning

    Agentic Workflows:
    • Function calling and tool use with 66.1% accuracy on BFCL-V4
    • Multi-step agent orchestration for complex task automation
    • Code generation and debugging with reasoning capabilities
    • Autonomous task planning across coding, cybersecurity, finance, and search domains

    Production Applications:
    • Long-context document analysis up to 262K tokens natively
    • Multilingual chatbots and customer support across 201 languages
    • RAG systems with vision-language understanding for knowledge retrieval
    • Research and data analysis combining text and visual information

    Enterprise Solutions:
    • Medical image analysis and VQA with specialized training
    • Financial document processing with OCR and reasoning
    • E-commerce product cataloging from images and videos
    • Educational platforms with math and science problem solving

Related models
  • Model provider
    Qwen
  • Type
    Reasoning
    Vision
    Chat
  • Main use cases
    Chat
    Vision
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Speed
    High
  • Intelligence
    High
  • Deployment
    Serverless
    On-Demand Dedicated
  • Parameters
    9B
  • Context length
    262K
  • Input price

    $0.10 / 1M tokens

  • Output price

    $0.15 / 1M tokens

  • Input modalities
    Text
    Image
    Video
  • Output modalities
    Text
  • Released
    February 23, 2026
  • Category
    Chat