Kimi K2.5

State-of-the-art multimodal thinking agent with vision and Agent Swarm

Try Now

read docs

About model

Kimi K2.5 is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2.5 dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.

Humanity's Last Exam (w/ tools)

50.2%

Expert-level multimodal reasoning across 100+ subjects

Tokens (Mixed Visual & Text)

15T

Native multimodal pretraining at scale

Inference Speed-Up

Native INT4 quantization with QAT

Model key capabilities

Native Multimodality: Pre-trained on vision-language tokens, excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
Agent Swarm: Transitions from single-agent scaling to self-directed, coordinated swarm-like execution—decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
Production-Ready Efficiency: Native INT4 quantization achieving lossless 2x speed improvements with 256K context window

Quickstart guides

Apps

How to Build a Lovable Clone with Kimi K2

Agents

Agent Workflows

RAG

Building a RAG Workflow

Performance benchmarks

Model	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
Kimi K2.5	87.6%	24.4%			73.8%
Related open-source models
Competitor closed-source models
Claude Opus 4.6	90.5%	34.2%			78.7%
OpenAI o3	83.3%	24.9%		99.2%	62.3%
OpenAI o1	76.8%			96.4%	48.9%
GPT-4o	49.2%	2.7%	32.3%	89.3%	31.0%

API usage

cURL
Python
Typescript

Endpoint:

moonshotai/Kimi-K2.5

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="moonshotai/Kimi-K2.5",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'moonshotai/Kimi-K2.5',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• Mixture-of-Experts (MoE) architecture with 1T total parameters and 32B activated parameters
• 61 total layers including 1 dense layer with 384 experts selecting 8 per token
• Multi-head Latent Attention (MLA) mechanism with 7168 attention hidden dimension
• Native vision encoder: MoonViT with 400M parameters for vision-language integration
• Native INT4 quantization applied to MoE components through Quantization-Aware Training (QAT)
• 256K context window enabling complex long-horizon multimodal agentic tasks
• 160K vocabulary size with SwiGLU activation function
• Unified architecture combining vision and text, instant and thinking modes, conversational and agentic paradigms

Training Methodology:
• Continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base
• Native multimodal training—pre-trained on vision-language tokens for seamless cross-modal reasoning
• End-to-end trained to interleave chain-of-thought reasoning with function calls and visual grounding
• Quantization-Aware Training (QAT) employed for lossless INT4 inference with 2x speed
• Agent Swarm training—transitions from single-agent scaling to self-directed, coordinated swarm-like execution
• Specialized training for parallel task decomposition and domain-specific agent instantiation

Key Capabilities:
• Native Multimodality: Excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
• Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
• Agent Swarm: Decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
• Vision benchmarks: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista, 77.5% CharXiv reasoning

Performance Characteristics:
• State-of-the-art 50.2% on Humanity's Last Exam (HLE) with tools across 100+ expert subjects
• Advanced mathematical reasoning: 96.1% AIME 2025, 95.4% HMMT 2025, 81.8% IMO-AnswerBench, 87.4% GPQA-Diamond
• Strong coding capabilities: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual, 85.0% LiveCodeBench v6
• Agentic search with swarm: 78.4% BrowseComp (swarm mode), 57.5% Seal-0
• Long-context excellence: 79.3% on AA-LCR (avg@3), 69.4% LongBench-v2 (128K context)
• 2x generation speed improvement through native INT4 quantization without performance degradation
‍
Applications & use cases
Multimodal Agentic Reasoning:
• Expert-level reasoning across 100+ subjects achieving 50.2% on Humanity's Last Exam with tools
• Vision-grounded reasoning: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista
• Cross-modal problem solving combining visual understanding with mathematical and logical reasoning
• PhD-level mathematical problem solving: 96.1% AIME 2025, 95.4% HMMT 2025
• Dynamic hypothesis generation from visual and textual inputs with evidence verification

Coding with Vision:
• Generate code from visual specifications: UI designs, mockups, and video workflows
• Autonomous tool chaining for visual data processing and analysis
• Production-level coding: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual
• Frontend development from visual designs: fully functional HTML, React, and responsive web applications
• Video-to-code generation: analyze video workflows and generate implementation code
• Competitive programming: 85.0% LiveCodeBench v6, 53.6% OJ-Bench

Agent Swarm Orchestration:
• Self-directed task decomposition into parallel sub-tasks
• Dynamically instantiate domain-specific agents for coordinated execution
• Swarm mode performance: 62.3% BrowseComp, 19.4% WideSearch
• Complex research workflows with parallel information gathering and synthesis
• Multi-agent coding projects with specialized sub-agents for different components

Visual Understanding & Analysis:
• Native image and video understanding with 400M parameter MoonViT encoder
• Chart and graph reasoning: 77.5% CharXiv reasoning questions
• Document understanding and visual question answering
• Scientific visualization analysis and interpretation
• UI/UX design understanding for code generation

Agentic Search & Web Reasoning:
• Goal-directed web-based reasoning with visual content understanding
• Continuous browsing, searching, and reasoning over multimodal web information
• 62.3% BrowseComp in swarm mode with coordinated sub-agent exploration
• Visual content extraction and analysis from web sources

Long-Horizon Multimodal Workflows:
• Research automation across text and visual sources
• Video analysis workflows with tool-augmented reasoning
• Complex design-to-implementation pipelines
• Multi-step visual data processing and code generation
• 79.3% AA-LCR (avg@3), 69.4% LongBench-v2 with 128K context

Creative & Multimodal Content Generation:
• Image-grounded creative writing and storytelling
• Visual analysis and cultural commentary
• Technical documentation from visual specifications
• Educational content combining visual and textual explanations
‍

Related models

Model specifications

Model data

Model provider
Moonshot AI
Type
Chat
Code
LLM
Main use cases
Vision
Speed
Medium
Intelligence
Very High
Deployment
Serverless
On-Demand Dedicated
Monthly Reserved
Endpoint
moonshotai/Kimi-K2.5
Parameters
1T
Activated parameters
32B
Context length
262K
Input price
$0.50 / 1M tokens
Output price
$2.80 / 1M tokens
Input modalities
Text
Image
Output modalities
Text

Released
December 31, 2025
Last updated
January 26, 2026
Quantization level
INT4
External link
Provider docs
Category
Chat

Run in Playground

Quickstart docs

Deploy model

Kimi K2.5

About model

API usage

Model card

Applications & use cases