This model is not currently supported on Together AI.
Visit our Models page to view all the latest models.
Introducing Kimi K2.5
Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.
Key Capabilities
Kimi K2.5 API Usage
Endpoint
How to use Kimi K2.5
Model details
Architecture Overview:
• Mixture-of-Experts (MoE) architecture with 1T total parameters and 32B activated parameters
• 61 total layers including 1 dense layer with 384 experts selecting 8 per token
• Multi-head Latent Attention (MLA) mechanism with 7168 attention hidden dimension
• Native vision encoder: MoonViT with 400M parameters for vision-language integration
• Native INT4 quantization applied to MoE components through Quantization-Aware Training (QAT)
• 256K context window enabling complex long-horizon multimodal agentic tasks
• 160K vocabulary size with SwiGLU activation function
• Unified architecture combining vision and text, instant and thinking modes, conversational and agentic paradigms
Training Methodology:
• Continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base
• Native multimodal training—pre-trained on vision-language tokens for seamless cross-modal reasoning
• End-to-end trained to interleave chain-of-thought reasoning with function calls and visual grounding
• Quantization-Aware Training (QAT) employed for lossless INT4 inference with 2x speed
• Agent Swarm training—transitions from single-agent scaling to self-directed, coordinated swarm-like execution
• Specialized training for parallel task decomposition and domain-specific agent instantiation
Key Capabilities:
• Native Multimodality: Excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
• Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
• Agent Swarm: Decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
• Vision benchmarks: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista, 77.5% CharXiv reasoning
Performance Characteristics:
• State-of-the-art 50.2% on Humanity's Last Exam (HLE) with tools across 100+ expert subjects
• Advanced mathematical reasoning: 96.1% AIME 2025, 95.4% HMMT 2025, 81.8% IMO-AnswerBench, 87.4% GPQA-Diamond
• Strong coding capabilities: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual, 85.0% LiveCodeBench v6
• Agentic search with swarm: 78.4% BrowseComp (swarm mode), 57.5% Seal-0
• Long-context excellence: 79.3% on AA-LCR (avg@3), 69.4% LongBench-v2 (128K context)
• 2x generation speed improvement through native INT4 quantization without performance degradation
Prompting Kimi K2.5
Applications & Use Cases
Multimodal Agentic Reasoning:
• Expert-level reasoning across 100+ subjects achieving 50.2% on Humanity's Last Exam with tools
• Vision-grounded reasoning: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista
• Cross-modal problem solving combining visual understanding with mathematical and logical reasoning
• PhD-level mathematical problem solving: 96.1% AIME 2025, 95.4% HMMT 2025
• Dynamic hypothesis generation from visual and textual inputs with evidence verification
Coding with Vision:
• Generate code from visual specifications: UI designs, mockups, and video workflows
• Autonomous tool chaining for visual data processing and analysis
• Production-level coding: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual
• Frontend development from visual designs: fully functional HTML, React, and responsive web applications
• Video-to-code generation: analyze video workflows and generate implementation code
• Competitive programming: 85.0% LiveCodeBench v6, 53.6% OJ-Bench
Agent Swarm Orchestration:
• Self-directed task decomposition into parallel sub-tasks
• Dynamically instantiate domain-specific agents for coordinated execution
• Swarm mode performance: 62.3% BrowseComp, 19.4% WideSearch
• Complex research workflows with parallel information gathering and synthesis
• Multi-agent coding projects with specialized sub-agents for different components
Visual Understanding & Analysis:
• Native image and video understanding with 400M parameter MoonViT encoder
• Chart and graph reasoning: 77.5% CharXiv reasoning questions
• Document understanding and visual question answering
• Scientific visualization analysis and interpretation
• UI/UX design understanding for code generation
Agentic Search & Web Reasoning:
• Goal-directed web-based reasoning with visual content understanding
• Continuous browsing, searching, and reasoning over multimodal web information
• 62.3% BrowseComp in swarm mode with coordinated sub-agent exploration
• Visual content extraction and analysis from web sources
Long-Horizon Multimodal Workflows:
• Research automation across text and visual sources
• Video analysis workflows with tool-augmented reasoning
• Complex design-to-implementation pipelines
• Multi-step visual data processing and code generation
• 79.3% AA-LCR (avg@3), 69.4% LongBench-v2 with 128K context
Creative & Multimodal Content Generation:
• Image-grounded creative writing and storytelling
• Visual analysis and cultural commentary
• Technical documentation from visual specifications
• Educational content combining visual and textual explanations

