Models / Moonshot AI
Chat
Code
LLM

Kimi K2 Thinking

State-of-the-art thinking agent with deep reasoning and tool orchestration

About model

Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.

Humanity's Last Exam (w/ tools)

44.9%

Expert-level reasoning across 100+ subjects

Sequential Tool Calls

300

Stable long-horizon agency without drift

Inference Speed-Up

2x

Native INT4 quantization with QAT

Model key capabilities
  • Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls for autonomous workflows
  • Agentic Search Excellence: 60.2% BrowseComp, 56.3% Seal-0 — superior goal-directed web reasoning in information-rich environments
  • Advanced Mathematical Reasoning: 99.1% AIME 2025 (w/ python), 95.1% HMMT 2025 — elite competition-level problem solving
  • Production-Ready Efficiency: Native INT4 quantization achieving lossless 2x speed improvements with 256K context window
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

84.2%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    moonshotai/Kimi-K2-Thinking

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "moonshotai/Kimi-K2-Thinking",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="moonshotai/Kimi-K2-Thinking",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'moonshotai/Kimi-K2-Thinking',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Mixture-of-Experts (MoE) architecture with 1T total parameters and 32B activated parameters
    • 61 total layers including 1 dense layer with 384 experts selecting 8 per token
    • Multi-head Latent Attention (MLA) mechanism with 7168 attention hidden dimension
    • Native INT4 quantization applied to MoE components through Quantization-Aware Training (QAT)
    • 256K context window enabling complex long-horizon agentic tasks
    • 160K vocabulary size with SwiGLU activation function

    Training Methodology:
    • End-to-end trained to interleave chain-of-thought reasoning with function calls
    • Quantization-Aware Training (QAT) employed in post-training stage for lossless INT4 inference
    • Specialized training for stable long-horizon agency across 200-300 consecutive tool invocations
    • Advanced reasoning depth scaling through multi-step test-time computation
    • Tool orchestration training enabling autonomous research, coding, and writing workflows

    Performance Characteristics:
    • State-of-the-art 44.9% on Humanity's Last Exam (HLE) with tools across 100+ expert subjects
    • Leading agentic search performance: 60.2% BrowseComp, 62.3% BrowseComp-ZH, 56.3% Seal-0
    • Elite mathematical reasoning: 99.1% AIME 2025 (w/ python), 95.1% HMMT 2025 (w/ python), 78.6% IMO-AnswerBench
    • Strong coding capabilities: 71.3% SWE-Bench Verified, 61.1% SWE-Bench Multilingual, 83.1% LiveCodeBench v6
    • 2x generation speed improvement through native INT4 quantization without performance degradation
    • Maintains coherent goal-directed behavior surpassing prior models that degrade after 30-50 steps

  • Applications & use cases

    Agentic Reasoning & Problem Solving:
    • Expert-level reasoning across 100+ subjects achieving 44.9% on Humanity's Last Exam with tools
    • PhD-level mathematical problem solving through 23+ interleaved reasoning and tool calls
    • Elite competition mathematics: 99.1% AIME 2025, 95.1% HMMT 2025 with Python tools
    • Dynamic hypothesis generation, evidence verification, and coherent answer construction

    Agentic Search & Web Reasoning:
    • State-of-the-art 60.2% BrowseComp performance, significantly outperforming 29.2% human baseline
    • Continuous browsing, searching, and reasoning over hard-to-find real-world web information
    • 200-300 sequential tool calls for deep research workflows without human interference
    • Goal-directed web-based reasoning with adaptive hypothesis refinement
    • Financial search: 47.4% FinSearchComp-T3, 87.0% Frames benchmark

    Agentic Coding & Software Development:
    • Production-level coding: 71.3% SWE-Bench Verified, 61.1% SWE-Bench Multilingual, 41.9% Multi-SWE-bench
    • Component-heavy frontend development: fully functional HTML, React, and responsive web applications from single prompts
    • Multi-step development workflows with precision tool invocation and adaptive reasoning
    • Terminal automation: 47.1% Terminal-Bench with simulated tools
    • Competitive programming: 83.1% LiveCodeBench v6, 48.7% OJ-Bench (C++)

    Creative & Practical Writing:
    • Creative writing with vivid imagery, emotional depth, and thematic resonance
    • Fiction, cultural reviews, and science fiction with natural fluency and style command
    • Academic and research writing with rigorous logic, thoroughness, and substantive richness
    • 73.8% Longform Writing benchmark demonstrating instruction adherence and perspective breadth
    • Personal and emotional responses with empathy, nuance, and actionable guidance

    Long-Horizon Autonomous Workflows:
    • Research automation executing hundreds of coherent reasoning steps
    • Office automation and document generation workflows
    • Multi-step coding projects from ideation to functional products
    • Complex problem decomposition into clear, actionable subtasks
    • Stable agency surpassing models that degrade after 30-50 steps

Related models
  • Model provider
    Moonshot AI
  • Type
    Chat
    Code
    LLM
  • Main use cases
    Chat
    Reasoning
    Function Calling
  • Features
    Function Calling
  • Fine tuning
    Supported
  • Deployment
    Serverless
    On-Demand Dedicated
    Monthly Reserved
  • Parameters
    1T
  • Activated parameters
    32B
  • Context length
    256K
  • Input price

    $1.20 / 1M tokens

  • Output price

    $4.00 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    November 4, 2025
  • Last updated
    November 9, 2025
  • Quantization level
    INT4
  • External link
  • Category
    Chat