Kimi K2.6
Native multimodal agentic model with long-horizon coding and Agent Swarm
About model
Kimi K2.6 is Moonshot AI's native multimodal agentic model built on a 1T parameter (32B activated) MoE architecture with 256K context. It delivers long-horizon coding stability across multiple languages and domains, Agent Swarm orchestration scaling to 300 sub-agents with 4,000 coordinated steps, and proactive autonomous execution for persistent background agents. The model supports text, image, and video input with thinking mode for multi-step reasoning and tool invocation.
54.00%
Expert-level multimodal reasoning across 100+ subjects
300
4000 coordinated steps for parallel task decomposition
80.20%
Long-horizon coding across languages and domains
- Long-Horizon Coding: Stable end-to-end coding across Rust, Go, Python, frontend, DevOps, and performance optimization with 80.2% SWE-Bench Verified and 89.6% LiveCodeBench v6
- Agent Swarm: Scales to 300 sub-agents executing 4,000 coordinated steps, decomposing complex tasks into parallel domain-specialized subtasks for end-to-end autonomous output
- Multimodal Understanding: Native text, image, and video input via MoonViT encoder with 79.4% MMMU-Pro and coding-driven design from visual inputs to production interfaces
- Proactive Autonomous Execution: Persistent background agents managing schedules, code execution, and cross-platform operations with 73.1% OSWorld-Verified
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
Kimi K2.6 | 96.40% | 90.50% | 34.70% | 89.60% | 80.20% | Related open-source models | Competitor closed-source models |
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
API usage
Endpoint:
Model card
Architecture Overview:
• 1T total parameter MoE architecture with 32B parameters activated per token
• 384 experts with 8 selected per token plus 1 shared expert, using Multi-head Latent Attention (MLA)
• 256K token context window with native multimodal support for text, image, and video input
• MoonViT vision encoder (400M parameters) for image and video understanding
• Thinking mode for multi-step reasoning and tool invocation
• Native INT4 quantization
Training Methodology:
• Built on the Kimi K2.5 architecture with targeted improvements for long-horizon coding stability
• Enhanced reinforcement learning for coding task distributions across Rust, Go, Python, frontend, DevOps, and performance optimization
• Improved instruction compliance and self-correction capabilities for complex software engineering tasks
Performance Characteristics:
• 54.0% HLE-Full w/ tools for expert-level multimodal reasoning
• 80.2% SWE-Bench Verified, 58.6% SWE-Bench Pro, 76.7% SWE-Bench Multilingual
• 89.6% LiveCodeBench v6 for code generation
• 96.4% AIME 2026, 90.5% GPQA-Diamond for reasoning
• 83.2% BrowseComp (86.3% with Agent Swarm) for agentic search
• 73.1% OSWorld-Verified for autonomous computer use
• Agent Swarm: 300 sub-agents executing 4,000 coordinated steps
Prompting
Together AI API Access:
• Access Kimi K2.6 via Together AI APIs using the endpoint moonshotai/Kimi-K2.6
• Authenticate using your Together AI API key in request headers
• Supports thinking mode, tool calling, image input, and video input
• Available on both serverless and dedicated infrastructure
Applications & use cases
Long-Horizon Coding:
• Complex end-to-end software engineering across Rust, Go, Python, frontend, DevOps, and performance optimization
• 80.2% SWE-Bench Verified with improved stability on long-running coding tasks
• Coding-driven design: transforms prompts and visual inputs into production-ready interfaces and full-stack workflows
Agentic Workflows:
• Agent Swarm: 300 sub-agents executing 4,000 coordinated steps for parallel task decomposition
• Proactive autonomous execution for persistent background agents managing schedules, code, and cross-platform operations
• 73.1% OSWorld-Verified for autonomous computer use
• Multi-step tool invocation with thinking mode for complex problem solving
Multimodal Reasoning:
• Native image and video understanding via 400M parameter MoonViT encoder
• 79.4% MMMU-Pro for multimodal understanding
• 87.4% MathVision for visual mathematical reasoning
• 256K context for processing large codebases, documents, and visual inputs
- Model providerMoonshot AI
- TypeReasoningVisionChatCodeLLM
- Main use casesReasoning
- FeaturesFunction CallingJSON Mode
- SpeedHigh
- IntelligenceHigh
- DeploymentServerlessMonthly Reserved
- Endpoint
- Parameters1T
- Activated parameters32B
- Context length256K
- Input price
$1.20 / 1M tokens
$0.20 (cached)/1M
- Output price
$4.50 / 1M tokens
- Input modalitiesTextImageVideo
- Output modalitiesText
- ReleasedApril 20, 2026
- Quantization levelFP4
- External link
- CategoryChat