Models / Moonshot AI
Reasoning
Vision
Chat
Code
LLM

Kimi K2.6

Native multimodal agentic model with long-horizon coding and Agent Swarm

About model

Kimi K2.6 is Moonshot AI's native multimodal agentic model built on a 1T parameter (32B activated) MoE architecture with 256K context. It delivers long-horizon coding stability across multiple languages and domains, Agent Swarm orchestration scaling to 300 sub-agents with 4,000 coordinated steps, and proactive autonomous execution for persistent background agents. The model supports text, image, and video input with thinking mode for multi-step reasoning and tool invocation.

HLE-Full (w/ tools)

54.00%

Expert-level multimodal reasoning across 100+ subjects

Agent Swarm Sub-Agents

300

4000 coordinated steps for parallel task decomposition

SWE-Bench Verified

80.20%

Long-horizon coding across languages and domains

Model key capabilities
  • Long-Horizon Coding: Stable end-to-end coding across Rust, Go, Python, frontend, DevOps, and performance optimization with 80.2% SWE-Bench Verified and 89.6% LiveCodeBench v6
  • Agent Swarm: Scales to 300 sub-agents executing 4,000 coordinated steps, decomposing complex tasks into parallel domain-specialized subtasks for end-to-end autonomous output
  • Multimodal Understanding: Native text, image, and video input via MoonViT encoder with 79.4% MMMU-Pro and coding-driven design from visual inputs to production interfaces
  • Proactive Autonomous Execution: Persistent background agents managing schedules, code execution, and cross-platform operations with 73.1% OSWorld-Verified
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

96.40%

90.50%

34.70%

89.60%

80.20%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    moonshotai/Kimi-K2.6

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "moonshotai/Kimi-K2.6",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="moonshotai/Kimi-K2.6",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'moonshotai/Kimi-K2.6',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • 1T total parameter MoE architecture with 32B parameters activated per token
    • 384 experts with 8 selected per token plus 1 shared expert, using Multi-head Latent Attention (MLA)
    • 256K token context window with native multimodal support for text, image, and video input
    • MoonViT vision encoder (400M parameters) for image and video understanding
    • Thinking mode for multi-step reasoning and tool invocation
    • Native INT4 quantization

    Training Methodology:
    • Built on the Kimi K2.5 architecture with targeted improvements for long-horizon coding stability
    • Enhanced reinforcement learning for coding task distributions across Rust, Go, Python, frontend, DevOps, and performance optimization
    • Improved instruction compliance and self-correction capabilities for complex software engineering tasks

    Performance Characteristics:
    • 54.0% HLE-Full w/ tools for expert-level multimodal reasoning
    • 80.2% SWE-Bench Verified, 58.6% SWE-Bench Pro, 76.7% SWE-Bench Multilingual
    • 89.6% LiveCodeBench v6 for code generation
    • 96.4% AIME 2026, 90.5% GPQA-Diamond for reasoning
    • 83.2% BrowseComp (86.3% with Agent Swarm) for agentic search
    • 73.1% OSWorld-Verified for autonomous computer use
    • Agent Swarm: 300 sub-agents executing 4,000 coordinated steps

  • Prompting

    Together AI API Access:
    • Access Kimi K2.6 via Together AI APIs using the endpoint moonshotai/Kimi-K2.6
    • Authenticate using your Together AI API key in request headers
    • Supports thinking mode, tool calling, image input, and video input
    • Available on both serverless and dedicated infrastructure

  • Applications & use cases

    Long-Horizon Coding:
    • Complex end-to-end software engineering across Rust, Go, Python, frontend, DevOps, and performance optimization
    • 80.2% SWE-Bench Verified with improved stability on long-running coding tasks
    • Coding-driven design: transforms prompts and visual inputs into production-ready interfaces and full-stack workflows

    Agentic Workflows:
    • Agent Swarm: 300 sub-agents executing 4,000 coordinated steps for parallel task decomposition
    • Proactive autonomous execution for persistent background agents managing schedules, code, and cross-platform operations
    • 73.1% OSWorld-Verified for autonomous computer use
    • Multi-step tool invocation with thinking mode for complex problem solving

    Multimodal Reasoning:
    • Native image and video understanding via 400M parameter MoonViT encoder
    • 79.4% MMMU-Pro for multimodal understanding
    • 87.4% MathVision for visual mathematical reasoning
    • 256K context for processing large codebases, documents, and visual inputs

Related models
  • Model provider
    Moonshot AI
  • Type
    Reasoning
    Vision
    Chat
    Code
    LLM
  • Main use cases
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Speed
    High
  • Intelligence
    High
  • Deployment
    Serverless
    Monthly Reserved
  • Parameters
    1T
  • Activated parameters
    32B
  • Context length
    256K
  • Input price

    $1.20 / 1M tokens

    $0.20 (cached)/1M

  • Output price

    $4.50 / 1M tokens

  • Input modalities
    Text
    Image
    Video
  • Output modalities
    Text
  • Released
    April 20, 2026
  • Quantization level
    FP4
  • External link
  • Category
    Chat