Models / Moonshot AIKimi /  / Kimi K2.5 API

Kimi K2.5 API

State-of-the-art multimodal thinking agent with vision and Agent Swarm

Try Now
New

This model isn’t available on Together’s Serverless API.

Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.

Introducing Kimi K2.5

Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.

50.2%
Humanity's Last Exam (w/ tools)
Expert-level multimodal reasoning across 100+ subjects
15T
Tokens (Mixed Visual & Text)
Native multimodal pretraining at scale
2x
Inference Speed-Up
Native INT4 quantization with QAT

Key Capabilities

  • ✓ Native Multimodality: Pre-trained on vision-language tokens, excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
  • ✓ Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
  • ✓ Agent Swarm: Transitions from single-agent scaling to self-directed, coordinated swarm-like execution—decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
  • ✓ Production-Ready Efficiency: Native INT4 quantization achieving lossless 2x speed improvements with 256K context window

Kimi K2.5 API Usage

Endpoint

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'
curl -X POST "https://api.together.xyz/v1/images/generations" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "prompt": "Draw an anime style version of this image.",
    "width": 1024,
    "height": 768,
    "steps": 28,
    "n": 1,
    "response_format": "url",
    "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
  }'
curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }],
    "max_tokens": 512
  }'
curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{
      "role": "user",
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  }'
curl -X POST https://api.together.xyz/v1/rerank \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "query": "What animals can I find near Peru?",
    "documents": [
      "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
      "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
      "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
      "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
    ],
    "top_n": 2
  }'
curl -X POST https://api.together.xyz/v1/embeddings \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Our solar system orbits the Milky Way galaxy at about 515,000 mph.",
    "model": "moonshotai/Kimi-K2.5"
  }'
curl -X POST https://api.together.xyz/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    "prompt": "A horse is a horse",
    "max_tokens": 32,
    "temperature": 0.1,
    "safety_model": "moonshotai/Kimi-K2.5"
  }'
curl --location 'https://api.together.ai/v1/audio/generations' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer $TOGETHER_API_KEY' \
  --output speech.mp3 \
  --data '{
    "input": "Today is a wonderful day to build something people love!",
    "voice": "helpful woman",
    "response_format": "mp3",
    "sample_rate": 44100,
    "stream": false,
    "model": "moonshotai/Kimi-K2.5"
  }'
curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -F "model=moonshotai/Kimi-K2.5" \
  -F "language=en" \
  -F "response_format=json" \
  -F "timestamp_granularities=segment"
curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "moonshotai/Kimi-K2.5",
    "prompt": "some penguins building a snowman"
  }'
curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "moonshotai/Kimi-K2.5",
    "frame_images": [{"input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg"}]
  }'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="moonshotai/Kimi-K2.5",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)
from together import Together

client = Together()

imageCompletion = client.images.generate(
    model="moonshotai/Kimi-K2.5",
    width=1024,
    height=768,
    steps=28,
    prompt="Draw an anime style version of this image.",
    image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
)

print(imageCompletion.data[0].url)


from together import Together

client = Together()

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5",
    messages=[{
    	"role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }]
)
print(response.choices[0].message.content)

from together import Together

client = Together()
response = client.chat.completions.create(
  model="moonshotai/Kimi-K2.5",
  messages=[
  	{
	    "role": "user", 
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }
 ],
)

print(response.choices[0].message.content)

from together import Together

client = Together()

query = "What animals can I find near Peru?"

documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations.",
]

response = client.rerank.create(
  model="moonshotai/Kimi-K2.5",
  query=query,
  documents=documents,
  top_n=2
)

for result in response.results:
    print(f"Relevance Score: {result.relevance_score}")

from together import Together

client = Together()

response = client.embeddings.create(
  model = "moonshotai/Kimi-K2.5",
  input = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
)

from together import Together

client = Together()

response = client.completions.create(
  model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
  prompt="A horse is a horse",
  max_tokens=32,
  temperature=0.1,
  safety_model="moonshotai/Kimi-K2.5",
)

print(response.choices[0].text)

from together import Together

client = Together()

speech_file_path = "speech.mp3"

response = client.audio.speech.create(
  model="moonshotai/Kimi-K2.5",
  input="Today is a wonderful day to build something people love!",
  voice="helpful woman",
)
    
response.stream_to_file(speech_file_path)

from together import Together

client = Together()
response = client.audio.transcribe(
    model="moonshotai/Kimi-K2.5",
    language="en",
    response_format="json",
    timestamp_granularities="segment"
)
print(response.text)
from together import Together

client = Together()

# Create a video generation job
job = client.videos.create(
    prompt="A serene sunset over the ocean with gentle waves",
    model="moonshotai/Kimi-K2.5"
)
from together import Together

client = Together()

job = client.videos.create(
    model="moonshotai/Kimi-K2.5",
    frame_images=[
        {
            "input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
        }
    ]
)
import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'moonshotai/Kimi-K2.5',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);
import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.images.create({
    model: "moonshotai/Kimi-K2.5",
    width: 1024,
    height: 1024,
    steps: 28,
    prompt: "Draw an anime style version of this image.",
    image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
  });

  console.log(response.data[0].url);
}

main();

import Together from "together-ai";

const together = new Together();
const imageUrl = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png";

async function main() {
  const response = await together.chat.completions.create({
    model: "moonshotai/Kimi-K2.5",
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "Describe what you see in this image." },
        { type: "image_url", image_url: { url: imageUrl } }
      ]
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.chat.completions.create({
    model: "moonshotai/Kimi-K2.5",
    messages: [{
      role: "user",
      content: "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

const query = "What animals can I find near Peru?";
const documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
];

async function main() {
  const response = await together.rerank.create({
    model: "moonshotai/Kimi-K2.5",
    query: query,
    documents: documents,
    top_n: 2
  });
  
  for (const result of response.results) {
    console.log(`Relevance Score: ${result.relevance_score}`);
  }
}

main();


import Together from "together-ai";

const together = new Together();

const response = await client.embeddings.create({
  model: 'moonshotai/Kimi-K2.5',
  input: 'Our solar system orbits the Milky Way galaxy at about 515,000 mph',
});

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.completions.create({
    model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    prompt: "A horse is a horse",
    max_tokens: 32,
    temperature: 0.1,
    safety_model: "moonshotai/Kimi-K2.5"
  });
  
  console.log(response.choices[0]?.text);
}

main();

import Together from 'together-ai';

const together = new Together();

async function generateAudio() {
   const res = await together.audio.create({
    input: 'Today is a wonderful day to build something people love!',
    voice: 'helpful woman',
    response_format: 'mp3',
    sample_rate: 44100,
    stream: false,
    model: 'moonshotai/Kimi-K2.5',
  });

  if (res.body) {
    console.log(res.body);
    const nodeStream = Readable.from(res.body as ReadableStream);
    const fileStream = createWriteStream('./speech.mp3');

    nodeStream.pipe(fileStream);
  }
}

generateAudio();

import Together from "together-ai";

const together = new Together();

const response = await together.audio.transcriptions.create(
  model: "moonshotai/Kimi-K2.5",
  language: "en",
  response_format: "json",
  timestamp_granularities: "segment"
});
console.log(response)
import Together from "together-ai";

const together = new Together();

async function main() {
  // Create a video generation job
  const job = await together.videos.create({
    prompt: "A serene sunset over the ocean with gentle waves",
    model: "moonshotai/Kimi-K2.5"
  });
import Together from "together-ai";

const together = new Together();

const job = await together.videos.create({
  model: "moonshotai/Kimi-K2.5",
  frame_images: [
    {
      input_image: "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
    }
  ]
});

How to use Kimi K2.5

Model details

Architecture Overview:
• Mixture-of-Experts (MoE) architecture with 1T total parameters and 32B activated parameters
• 61 total layers including 1 dense layer with 384 experts selecting 8 per token
• Multi-head Latent Attention (MLA) mechanism with 7168 attention hidden dimension
• Native vision encoder: MoonViT with 400M parameters for vision-language integration
• Native INT4 quantization applied to MoE components through Quantization-Aware Training (QAT)
• 256K context window enabling complex long-horizon multimodal agentic tasks
• 160K vocabulary size with SwiGLU activation function
• Unified architecture combining vision and text, instant and thinking modes, conversational and agentic paradigms

Training Methodology:
• Continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base
• Native multimodal training—pre-trained on vision-language tokens for seamless cross-modal reasoning
• End-to-end trained to interleave chain-of-thought reasoning with function calls and visual grounding
• Quantization-Aware Training (QAT) employed for lossless INT4 inference with 2x speed
• Agent Swarm training—transitions from single-agent scaling to self-directed, coordinated swarm-like execution
• Specialized training for parallel task decomposition and domain-specific agent instantiation

Key Capabilities:
• Native Multimodality: Excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
• Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
• Agent Swarm: Decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
• Vision benchmarks: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista, 77.5% CharXiv reasoning

Performance Characteristics:
• State-of-the-art 50.2% on Humanity's Last Exam (HLE) with tools across 100+ expert subjects
• Advanced mathematical reasoning: 96.1% AIME 2025, 95.4% HMMT 2025, 81.8% IMO-AnswerBench, 87.4% GPQA-Diamond
• Strong coding capabilities: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual, 85.0% LiveCodeBench v6
• Agentic search with swarm: 78.4% BrowseComp (swarm mode), 57.5% Seal-0
• Long-context excellence: 79.3% on AA-LCR (avg@3), 69.4% LongBench-v2 (128K context)
• 2x generation speed improvement through native INT4 quantization without performance degradation

Prompting Kimi K2.5

Applications & Use Cases

Multimodal Agentic Reasoning:
• Expert-level reasoning across 100+ subjects achieving 50.2% on Humanity's Last Exam with tools
• Vision-grounded reasoning: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista
• Cross-modal problem solving combining visual understanding with mathematical and logical reasoning
• PhD-level mathematical problem solving: 96.1% AIME 2025, 95.4% HMMT 2025
• Dynamic hypothesis generation from visual and textual inputs with evidence verification

Coding with Vision:
• Generate code from visual specifications: UI designs, mockups, and video workflows
• Autonomous tool chaining for visual data processing and analysis
• Production-level coding: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual
• Frontend development from visual designs: fully functional HTML, React, and responsive web applications
• Video-to-code generation: analyze video workflows and generate implementation code
• Competitive programming: 85.0% LiveCodeBench v6, 53.6% OJ-Bench

Agent Swarm Orchestration:
• Self-directed task decomposition into parallel sub-tasks
• Dynamically instantiate domain-specific agents for coordinated execution
• Swarm mode performance: 62.3% BrowseComp, 19.4% WideSearch
• Complex research workflows with parallel information gathering and synthesis
• Multi-agent coding projects with specialized sub-agents for different components

Visual Understanding & Analysis:
• Native image and video understanding with 400M parameter MoonViT encoder
• Chart and graph reasoning: 77.5% CharXiv reasoning questions
• Document understanding and visual question answering
• Scientific visualization analysis and interpretation
• UI/UX design understanding for code generation

Agentic Search & Web Reasoning:
• Goal-directed web-based reasoning with visual content understanding
• Continuous browsing, searching, and reasoning over multimodal web information
• 62.3% BrowseComp in swarm mode with coordinated sub-agent exploration
• Visual content extraction and analysis from web sources

Long-Horizon Multimodal Workflows:
• Research automation across text and visual sources
• Video analysis workflows with tool-augmented reasoning
• Complex design-to-implementation pipelines
• Multi-step visual data processing and code generation
• 79.3% AA-LCR (avg@3), 69.4% LongBench-v2 with 128K context

Creative & Multimodal Content Generation:
• Image-grounded creative writing and storytelling
• Visual analysis and cultural commentary
• Technical documentation from visual specifications
• Educational content combining visual and textual explanations

Looking for production scale? Deploy on a dedicated endpoint

Deploy Kimi K2.5 on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.

Get started