Qwen3.5-397B-A17B API

Native multimodal model with efficient hybrid architecture for global deployment.

new

This model isn’t available on Together’s Serverless API.

Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.

Qwen3.5 is a native vision-language foundation model with 397B total parameters (17B activated) using sparse MoE architecture. Through early fusion training on multimodal tokens, Qwen3.5 achieves cross-generational parity with Qwen3 while outperforming Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks. The model features Gated Delta Networks combined with sparse Mixture-of-Experts, delivering 8.6x-19x faster decoding than Qwen3-Max with minimal latency overhead. Reinforcement learning scaled across million-agent environments enables robust real-world adaptability, while expanded support to 201 languages and dialects enables inclusive worldwide deployment on Together AI's production infrastructure.

397B

Total Parameters (17B activated)

Native multimodal MoE architecture

8.6-19x

Faster Decoding

Hybrid Gated Delta Networks vs Qwen3-Max

201

Languages & Dialects

Global deployment with cultural understanding

Key Capabilities:

✓ Native Multimodal Foundation: Early fusion training across text, images, and video—87.8% MMLU-Pro, 88.6% MathVision, 87.5% VideoMME
✓ Hybrid Efficiency: Gated Delta Networks + sparse MoE delivering 8.6-19x faster decoding with 17B activated parameters
✓ Global Multilingual: 201 languages and dialects—88.5% MMMLU, 78.9% WMT24++ across 55 languages
✓ Production-Ready Infrastructure: 99.9% SLA, available on serverless and dedicated infrastructure

Qwen3.5-397B-A17B API Usage

Endpoint

Qwen/Qwen3.5-397B-A17B

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

curl -X POST "https://api.together.xyz/v1/images/generations" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "prompt": "Draw an anime style version of this image.",
    "width": 1024,
    "height": 768,
    "steps": 28,
    "n": 1,
    "response_format": "url",
    "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
  }'

curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }],
    "max_tokens": 512
  }'

curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "messages": [{
      "role": "user",
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  }'

curl -X POST https://api.together.xyz/v1/rerank \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "query": "What animals can I find near Peru?",
    "documents": [
      "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
      "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
      "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
      "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
    ],
    "top_n": 2
  }'

curl -X POST https://api.together.xyz/v1/embeddings \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Our solar system orbits the Milky Way galaxy at about 515,000 mph.",
    "model": "Qwen/Qwen3.5-397B-A17B"
  }'

curl -X POST https://api.together.xyz/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    "prompt": "A horse is a horse",
    "max_tokens": 32,
    "temperature": 0.1,
    "safety_model": "Qwen/Qwen3.5-397B-A17B"
  }'

curl --location 'https://api.together.ai/v1/audio/generations' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer $TOGETHER_API_KEY' \
  --output speech.mp3 \
  --data '{
    "input": "Today is a wonderful day to build something people love!",
    "voice": "helpful woman",
    "response_format": "mp3",
    "sample_rate": 44100,
    "stream": false,
    "model": "Qwen/Qwen3.5-397B-A17B"
  }'

curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -F "model=Qwen/Qwen3.5-397B-A17B" \
  -F "language=en" \
  -F "response_format=json" \
  -F "timestamp_granularities=segment"

curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "prompt": "some penguins building a snowman"
  }'

curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "Qwen/Qwen3.5-397B-A17B",
    "frame_images": [{"input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg"}]
  }'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="Qwen/Qwen3.5-397B-A17B",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

from together import Together

client = Together()

imageCompletion = client.images.generate(
    model="Qwen/Qwen3.5-397B-A17B",
    width=1024,
    height=768,
    steps=28,
    prompt="Draw an anime style version of this image.",
    image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
)

print(imageCompletion.data[0].url)

from together import Together

client = Together()

response = client.chat.completions.create(
    model="Qwen/Qwen3.5-397B-A17B",
    messages=[{
    	"role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }]
)
print(response.choices[0].message.content)

from together import Together

client = Together()
response = client.chat.completions.create(
  model="Qwen/Qwen3.5-397B-A17B",
  messages=[
  	{
	    "role": "user", 
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }
 ],
)

print(response.choices[0].message.content)

from together import Together

client = Together()

query = "What animals can I find near Peru?"

documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations.",
]

response = client.rerank.create(
  model="Qwen/Qwen3.5-397B-A17B",
  query=query,
  documents=documents,
  top_n=2
)

for result in response.results:
    print(f"Relevance Score: {result.relevance_score}")

from together import Together

client = Together()

response = client.embeddings.create(
  model = "Qwen/Qwen3.5-397B-A17B",
  input = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
)

from together import Together

client = Together()

response = client.completions.create(
  model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
  prompt="A horse is a horse",
  max_tokens=32,
  temperature=0.1,
  safety_model="Qwen/Qwen3.5-397B-A17B",
)

print(response.choices[0].text)

from together import Together

client = Together()

speech_file_path = "speech.mp3"

response = client.audio.speech.create(
  model="Qwen/Qwen3.5-397B-A17B",
  input="Today is a wonderful day to build something people love!",
  voice="helpful woman",
)
    
response.stream_to_file(speech_file_path)

from together import Together

client = Together()
response = client.audio.transcribe(
    model="Qwen/Qwen3.5-397B-A17B",
    language="en",
    response_format="json",
    timestamp_granularities="segment"
)
print(response.text)

from together import Together

client = Together()

# Create a video generation job
job = client.videos.create(
    prompt="A serene sunset over the ocean with gentle waves",
    model="Qwen/Qwen3.5-397B-A17B"
)

from together import Together

client = Together()

job = client.videos.create(
    model="Qwen/Qwen3.5-397B-A17B",
    frame_images=[
        {
            "input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
        }
    ]
)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'Qwen/Qwen3.5-397B-A17B',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.images.create({
    model: "Qwen/Qwen3.5-397B-A17B",
    width: 1024,
    height: 1024,
    steps: 28,
    prompt: "Draw an anime style version of this image.",
    image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
  });

  console.log(response.data[0].url);
}

main();

import Together from "together-ai";

const together = new Together();
const imageUrl = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png";

async function main() {
  const response = await together.chat.completions.create({
    model: "Qwen/Qwen3.5-397B-A17B",
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "Describe what you see in this image." },
        { type: "image_url", image_url: { url: imageUrl } }
      ]
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.chat.completions.create({
    model: "Qwen/Qwen3.5-397B-A17B",
    messages: [{
      role: "user",
      content: "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

const query = "What animals can I find near Peru?";
const documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
];

async function main() {
  const response = await together.rerank.create({
    model: "Qwen/Qwen3.5-397B-A17B",
    query: query,
    documents: documents,
    top_n: 2
  });
  
  for (const result of response.results) {
    console.log(`Relevance Score: ${result.relevance_score}`);
  }
}

main();

import Together from "together-ai";

const together = new Together();

const response = await client.embeddings.create({
  model: 'Qwen/Qwen3.5-397B-A17B',
  input: 'Our solar system orbits the Milky Way galaxy at about 515,000 mph',
});

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.completions.create({
    model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    prompt: "A horse is a horse",
    max_tokens: 32,
    temperature: 0.1,
    safety_model: "Qwen/Qwen3.5-397B-A17B"
  });
  
  console.log(response.choices[0]?.text);
}

main();

import Together from 'together-ai';

const together = new Together();

async function generateAudio() {
   const res = await together.audio.create({
    input: 'Today is a wonderful day to build something people love!',
    voice: 'helpful woman',
    response_format: 'mp3',
    sample_rate: 44100,
    stream: false,
    model: 'Qwen/Qwen3.5-397B-A17B',
  });

  if (res.body) {
    console.log(res.body);
    const nodeStream = Readable.from(res.body as ReadableStream);
    const fileStream = createWriteStream('./speech.mp3');

    nodeStream.pipe(fileStream);
  }
}

generateAudio();

import Together from "together-ai";

const together = new Together();

const response = await together.audio.transcriptions.create(
  model: "Qwen/Qwen3.5-397B-A17B",
  language: "en",
  response_format: "json",
  timestamp_granularities: "segment"
});
console.log(response)

import Together from "together-ai";

const together = new Together();

async function main() {
  // Create a video generation job
  const job = await together.videos.create({
    prompt: "A serene sunset over the ocean with gentle waves",
    model: "Qwen/Qwen3.5-397B-A17B"
  });

import Together from "together-ai";

const together = new Together();

const job = await together.videos.create({
  model: "Qwen/Qwen3.5-397B-A17B",
  frame_images: [
    {
      input_image: "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
    }
  ]
});

How to use Qwen3.5-397B-A17B

Model details

Architecture Overview:
• Native vision-language foundation model with early fusion training on multimodal tokens
• 397B total parameters with 17B activated per forward pass via sparse MoE routing
• 512 experts with 10 routed + 1 shared expert activated per token
• Hybrid architecture: Gated Delta Networks + sparse Mixture-of-Experts
• 60 layers with 15 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) layout
• Gated DeltaNet: 64 linear attention heads for V, 16 for QK, head dimension 128
• Gated Attention: 32 heads for Q, 2 for KV, head dimension 256, RoPE dimension 64
• 248,320 token vocabulary (padded) with 15-25% lower token counts on technical datasets
• 256K native context (262,144 tokens), extensible to 1M tokens via YaRN scaling
• High-resolution vision: up to 1344x1344 pixels, UI screenshots with pixel-perfect element detection

Training Methodology:
• Early fusion multimodal training achieving near-100% efficiency vs text-only training
• Reinforcement learning scaled across million-agent environments with progressively complex task distributions
• Asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration
• Trained for robust real-world adaptability across reasoning, coding, agents, and visual understanding
• 201 languages and dialects with nuanced cultural and regional understanding
• Multi-token prediction (MTP) training for enhanced inference efficiency

Performance Characteristics:
• Reasoning Excellence: 87.8% MMLU-Pro, 91.3% AIME26, 83.6% LiveCodeBench v6, 88.4% GPQA
• Coding Leadership: 76.4% SWE-Bench Verified, 69.3% SWE-Bench Multilingual, 68.3% SecCodeBench
• Agentic Performance: 69.0/78.6% BrowseComp, 74.0% WideSearch, 72.9% BFCL-V4, 86.7% TAU2-Bench
• Multilingual SOTA: 88.5% MMMLU, 84.7% MMLU-ProX (29 languages), 78.9% WMT24++ (55 languages)
• Vision Language: 88.6% MathVision, 90.3% MathVista, 85.0% MMMU, 79.0% MMMU-Pro
• Video Understanding: 87.5% VideoMME (w/ sub), 84.7% VideoMMMU, 86.7% MLVU
• Document Understanding: 90.8% OmniDocBench1.5, 93.1% OCRBench, 82.0% CC-OCR
• Efficiency: 8.6x-19x faster decoding vs Qwen3-Max at 32k-256k context lengths

‍

Prompting Qwen3.5-397B-A17B

Applications & Use Cases

Native Multimodal Reasoning:
• Vision-language foundation with early fusion training across text, image, and video
• 87.8% MMLU-Pro, 91.3% AIME26, 83.6% LiveCodeBench v6, 88.4% GPQA Diamond
• Math vision: 88.6% MathVision, 90.3% MathVista (mini), 87.9% We-Math
• High-resolution image understanding: up to 1344x1344 pixels, UI element detection
• Video understanding: 87.5% VideoMME (w/ subtitles), 84.7% VideoMMMU, 86.7% MLVU
• Document processing: 90.8% OmniDocBench1.5, 93.1% OCRBench, 82.0% CC-OCR

SOTA Coding & Agentic Performance:
• Autonomous coding: 76.4% SWE-Bench Verified, 69.3% SWE-Bench Multilingual, 68.3% SecCodeBench
• Agentic workflows: 69.0/78.6% BrowseComp, 74.0% WideSearch, 86.7% TAU2-Bench
• Tool orchestration: 72.9% BFCL-V4, 38.3% Tool Decathlon, 46.1% MCP-Mark
• Visual agents: 65.6% ScreenSpot Pro, 62.2% OSWorld-Verified, 66.8% AndroidWorld
• Search with tools: 48.3% HLE w/ tool, 70.3% BrowseComp-zh

Global Multilingual Deployment:
• 201 languages and dialects with nuanced cultural and regional understanding
• Multilingual excellence: 88.5% MMMLU, 84.7% MMLU-ProX (29 languages)
• Translation quality: 78.9% WMT24++ across 55 languages using XCOMET-XXL
• Cross-lingual reasoning: 59.1% NOVA-63, 85.6% INCLUDE, 89.8% Global PIQA
• Multilingual math: 73.3% PolyMATH, 88.2% MAXIFE across 23 settings
• Instruction following: 76.5% IFBench, 67.6% MultiChallenge across languages

Production Efficiency at Scale:
• 8.6x-19x faster decoding than Qwen3-Max at 32k-256k context lengths
• Hybrid Gated Delta Networks architecture with minimal latency overhead
• 397B total parameters with only 17B activated per token via sparse MoE
• 256K native context (262,144 tokens), extensible to 1M via YaRN RoPE scaling
• Multi-token prediction (MTP) for enhanced inference throughput
• 248,320 token vocabulary reducing token counts 15-25% on technical datasets

Visual Agentic Workflows:
• Desktop screenshot understanding with UI element identification and workflow planning
• Pixel-perfect element detection for UI automation and testing
• Executable action generation for autonomous task completion
• Native tool calling for web search, code execution, and API orchestration
• Visual agents: 65.6% ScreenSpot Pro, 62.2% OSWorld-Verified, 66.8% AndroidWorld

Long-Context Knowledge Work:
• 256K native context supporting entire codebases, video transcripts, technical reports
• Extensible to 1M tokens for processing hour-scale videos and massive documents
• Long-context benchmarks: 63.2% LongBench v2, 68.7% AA-LCR
• Document understanding: 90.8% OmniDocBench1.5 with long-form comprehension
• Eliminates chunking for large-scale technical documentation and research

Thinking Mode for Complex Reasoning:
• Default thinking mode generates step-by-step reasoning before final responses
• Enhanced performance on math competitions (91.3% AIME26), coding challenges (83.6% LiveCodeBench)
• Disableable for direct responses in conversational or low-latency applications
• Configurable via API parameters without model changes
• Optimal for complex problem-solving, mathematical proofs, algorithm design

‍

Model Provider:

Qwen

Type:

Code

Variant:

Parameters:

397B (17B activated)

Deployment:

Serverless

On-Demand Dedicated

Monthly Reserved

Quantization

FP4

Context length:

256K

Resolution / Duration

Pricing:

$0.60 input / $3.60 output

Check pricing

Run in playground

Deploy model

Quickstart docs