Models / Moonshot AI
LLM

Kimi K2 Instruct

State-of-the-art mixture-of-experts agentic intelligence model with 1 T parameters, 128K context, and native tool use

About model

Kimi K2 Instruct is a post-trained model for general-purpose chat and agentic experiences, excelling in tool use, reasoning, and autonomous problem-solving. It is designed for drop-in use, providing reflex-grade responses without long thinking. Suitable for researchers and builders seeking a strong foundation for custom solutions.

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    moonshotai/Kimi-K2-Instruct

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "moonshotai/Kimi-K2-Instruct",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="moonshotai/Kimi-K2-Instruct",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'moonshotai/Kimi-K2-Instruct',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • How to use model

    Get started with this model in 10 lines of code! The model ID is moonshotai/Kimi-K2-Instruct and the pricing is $1 for input tokens and $3 for output tokens.

        
          from together import Together
    
          client = Together()
          resp = client.chat.completions.create(
              model="moonshotai/Kimi-K2-Instruct",
              messages=[{"role":"user","content":"Code a hacker news clone"}],
              stream=True,
          )
          for tok in resp:
              print(tok.choices[0].delta.content, end="", flush=True)
        
    
  • Model card

    Architecture Overview:

    • 1 T-parameter MoE with 32 B activated parameters
    • Hybrid MoE sparsity for compute efficiency
    • 128K token context for deep document comprehension
    • Agentic design with native tool usage & CLI integration

    Training Methodology:

    • Pre-trained on 15.5 T tokens using MuonClip optimizer for stability
    • Zero-instability training at large scale

    Performance Characteristics:

    • SOTA on LiveCodeBench v6, AIME 2025, MMLU-Redux, and SWE-bench (agentic)
  • Prompting

    • Use natural language instructions or tool commands
    • Temperature ≈ 0.6: Calibrated to Kimi‑K2‑Instruct’s RLHF alignment curve; higher values yield verbosity.
    • Kimi K2 autonomously invokes tools to fulfill tasks: Pass a JSON schema in tools=[…]; set tool_choice="auto". Kimi decides when/what to call.
    • Supports multi-turn dialogues & chained workflows: Because the model is “agentic”, give a high‑level objective (“Analyse this CSV and write a report”), letting it orchestrate sub‑tasks.
    • Chunk very long contexts: 128 K is huge, but response speed drops on >100 K inputs; supply a short executive brief in the final user message to focus the model.
  • Applications & use cases

    Kimi K2 shines in scenarios requiring autonomous problem-solving – specifically with coding & tool use:

    • Agentic Workflows: Automate multi-step tasks like booking flights, research, or data analysis using tools/APIs
    • Coding & Debugging: Solve software engineering tasks (e.g., SWE-bench), generate patches, or debug code
    • Research & Report Generation: Summarize technical documents, analyze trends, or draft reports using long-context capabilities
    • STEM Problem-Solving: Tackle advanced math (AIME, MATH), logic puzzles (ZebraLogic), or scientific reasoning
    • Tool Integration: Build AI agents that interact with APIs (e.g., weather data, databases).
Related models
  • Model provider
    Moonshot AI
  • Type
    LLM
  • Main use cases
    Chat
    Coding Agents
  • Features
    JSON Mode
  • Fine tuning
    Supported
  • Deployment
    Serverless
    On-Demand Dedicated
  • Parameters
    1 Trillion
  • Activated parameters
    32B
  • Context length
    128K tokens
  • Input price

    $1.00 / 1M tokens

  • Output price

    $3.00 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    July 10, 2025
  • Last updated
    July 13, 2025
  • Quantization level
    FP8
  • External link
  • Category
    Chat