Models / Moonshot AIKimi /  / Kimi K2 Thinking API

Kimi K2 Thinking API

State-of-the-art thinking agent with deep reasoning and tool orchestration

Try Now
New

This model is not currently supported on Together AI.

Visit our Models page to view all the latest models.

Introducing Kimi K2 Thinking

Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.

44.9%
Humanity's Last Exam (w/ tools)
Expert-level reasoning across 100+ subjects
300
Sequential Tool Calls
Stable long-horizon agency without drift
2x
Inference Speed-Up
Native INT4 quantization with QAT
Key Capabilities
Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls for autonomous workflows
Agentic Search Excellence: 60.2% BrowseComp, 56.3% Seal-0 — superior goal-directed web reasoning in information-rich environments
Advanced Mathematical Reasoning: 99.1% AIME 2025 (w/ python), 95.1% HMMT 2025 — elite competition-level problem solving
Production-Ready Efficiency: Native INT4 quantization achieving lossless 2x speed improvements with 256K context window

Kimi K2 Thinking API Usage

Endpoint

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'
curl -X POST "https://api.together.xyz/v1/images/generations" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "prompt": "Draw an anime style version of this image.",
    "width": 1024,
    "height": 768,
    "steps": 28,
    "n": 1,
    "response_format": "url",
    "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
  }'
curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }],
    "max_tokens": 512
  }'
curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [{
      "role": "user",
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  }'
curl -X POST https://api.together.xyz/v1/rerank \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "query": "What animals can I find near Peru?",
    "documents": [
      "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
      "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
      "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
      "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
    ],
    "top_n": 2
  }'
curl -X POST https://api.together.xyz/v1/embeddings \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Our solar system orbits the Milky Way galaxy at about 515,000 mph.",
    "model": "moonshotai/Kimi-K2-Thinking"
  }'
curl -X POST https://api.together.xyz/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    "prompt": "A horse is a horse",
    "max_tokens": 32,
    "temperature": 0.1,
    "safety_model": "moonshotai/Kimi-K2-Thinking"
  }'
curl --location 'https://api.together.ai/v1/audio/generations' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer $TOGETHER_API_KEY' \
  --output speech.mp3 \
  --data '{
    "input": "Today is a wonderful day to build something people love!",
    "voice": "helpful woman",
    "response_format": "mp3",
    "sample_rate": 44100,
    "stream": false,
    "model": "moonshotai/Kimi-K2-Thinking"
  }'
curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -F "model=moonshotai/Kimi-K2-Thinking" \
  -F "language=en" \
  -F "response_format=json" \
  -F "timestamp_granularities=segment"
curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "prompt": "some penguins building a snowman"
  }'
curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "frame_images": [{"input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg"}]
  }'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="moonshotai/Kimi-K2-Thinking",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)
from together import Together

client = Together()

imageCompletion = client.images.generate(
    model="moonshotai/Kimi-K2-Thinking",
    width=1024,
    height=768,
    steps=28,
    prompt="Draw an anime style version of this image.",
    image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
)

print(imageCompletion.data[0].url)


from together import Together

client = Together()

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2-Thinking",
    messages=[{
    	"role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }]
)
print(response.choices[0].message.content)

from together import Together

client = Together()
response = client.chat.completions.create(
  model="moonshotai/Kimi-K2-Thinking",
  messages=[
  	{
	    "role": "user", 
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }
 ],
)

print(response.choices[0].message.content)

from together import Together

client = Together()

query = "What animals can I find near Peru?"

documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations.",
]

response = client.rerank.create(
  model="moonshotai/Kimi-K2-Thinking",
  query=query,
  documents=documents,
  top_n=2
)

for result in response.results:
    print(f"Relevance Score: {result.relevance_score}")

from together import Together

client = Together()

response = client.embeddings.create(
  model = "moonshotai/Kimi-K2-Thinking",
  input = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
)

from together import Together

client = Together()

response = client.completions.create(
  model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
  prompt="A horse is a horse",
  max_tokens=32,
  temperature=0.1,
  safety_model="moonshotai/Kimi-K2-Thinking",
)

print(response.choices[0].text)

from together import Together

client = Together()

speech_file_path = "speech.mp3"

response = client.audio.speech.create(
  model="moonshotai/Kimi-K2-Thinking",
  input="Today is a wonderful day to build something people love!",
  voice="helpful woman",
)
    
response.stream_to_file(speech_file_path)

from together import Together

client = Together()
response = client.audio.transcribe(
    model="moonshotai/Kimi-K2-Thinking",
    language="en",
    response_format="json",
    timestamp_granularities="segment"
)
print(response.text)
from together import Together

client = Together()

# Create a video generation job
job = client.videos.create(
    prompt="A serene sunset over the ocean with gentle waves",
    model="moonshotai/Kimi-K2-Thinking"
)
from together import Together

client = Together()

job = client.videos.create(
    model="moonshotai/Kimi-K2-Thinking",
    frame_images=[
        {
            "input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
        }
    ]
)
import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'moonshotai/Kimi-K2-Thinking',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);
import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.images.create({
    model: "moonshotai/Kimi-K2-Thinking",
    width: 1024,
    height: 1024,
    steps: 28,
    prompt: "Draw an anime style version of this image.",
    image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
  });

  console.log(response.data[0].url);
}

main();

import Together from "together-ai";

const together = new Together();
const imageUrl = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png";

async function main() {
  const response = await together.chat.completions.create({
    model: "moonshotai/Kimi-K2-Thinking",
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "Describe what you see in this image." },
        { type: "image_url", image_url: { url: imageUrl } }
      ]
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.chat.completions.create({
    model: "moonshotai/Kimi-K2-Thinking",
    messages: [{
      role: "user",
      content: "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

const query = "What animals can I find near Peru?";
const documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
];

async function main() {
  const response = await together.rerank.create({
    model: "moonshotai/Kimi-K2-Thinking",
    query: query,
    documents: documents,
    top_n: 2
  });
  
  for (const result of response.results) {
    console.log(`Relevance Score: ${result.relevance_score}`);
  }
}

main();


import Together from "together-ai";

const together = new Together();

const response = await client.embeddings.create({
  model: 'moonshotai/Kimi-K2-Thinking',
  input: 'Our solar system orbits the Milky Way galaxy at about 515,000 mph',
});

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.completions.create({
    model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    prompt: "A horse is a horse",
    max_tokens: 32,
    temperature: 0.1,
    safety_model: "moonshotai/Kimi-K2-Thinking"
  });
  
  console.log(response.choices[0]?.text);
}

main();

import Together from 'together-ai';

const together = new Together();

async function generateAudio() {
   const res = await together.audio.create({
    input: 'Today is a wonderful day to build something people love!',
    voice: 'helpful woman',
    response_format: 'mp3',
    sample_rate: 44100,
    stream: false,
    model: 'moonshotai/Kimi-K2-Thinking',
  });

  if (res.body) {
    console.log(res.body);
    const nodeStream = Readable.from(res.body as ReadableStream);
    const fileStream = createWriteStream('./speech.mp3');

    nodeStream.pipe(fileStream);
  }
}

generateAudio();

import Together from "together-ai";

const together = new Together();

const response = await together.audio.transcriptions.create(
  model: "moonshotai/Kimi-K2-Thinking",
  language: "en",
  response_format: "json",
  timestamp_granularities: "segment"
});
console.log(response)
import Together from "together-ai";

const together = new Together();

async function main() {
  // Create a video generation job
  const job = await together.videos.create({
    prompt: "A serene sunset over the ocean with gentle waves",
    model: "moonshotai/Kimi-K2-Thinking"
  });
import Together from "together-ai";

const together = new Together();

const job = await together.videos.create({
  model: "moonshotai/Kimi-K2-Thinking",
  frame_images: [
    {
      input_image: "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
    }
  ]
});

How to use Kimi K2 Thinking

Model details

Architecture Overview:
• Mixture-of-Experts (MoE) architecture with 1T total parameters and 32B activated parameters
• 61 total layers including 1 dense layer with 384 experts selecting 8 per token
• Multi-head Latent Attention (MLA) mechanism with 7168 attention hidden dimension
• Native INT4 quantization applied to MoE components through Quantization-Aware Training (QAT)
• 256K context window enabling complex long-horizon agentic tasks
• 160K vocabulary size with SwiGLU activation function

Training Methodology:
• End-to-end trained to interleave chain-of-thought reasoning with function calls
• Quantization-Aware Training (QAT) employed in post-training stage for lossless INT4 inference
• Specialized training for stable long-horizon agency across 200-300 consecutive tool invocations
• Advanced reasoning depth scaling through multi-step test-time computation
• Tool orchestration training enabling autonomous research, coding, and writing workflows

Performance Characteristics:
• State-of-the-art 44.9% on Humanity's Last Exam (HLE) with tools across 100+ expert subjects
• Leading agentic search performance: 60.2% BrowseComp, 62.3% BrowseComp-ZH, 56.3% Seal-0
• Elite mathematical reasoning: 99.1% AIME 2025 (w/ python), 95.1% HMMT 2025 (w/ python), 78.6% IMO-AnswerBench
• Strong coding capabilities: 71.3% SWE-Bench Verified, 61.1% SWE-Bench Multilingual, 83.1% LiveCodeBench v6
• 2x generation speed improvement through native INT4 quantization without performance degradation
• Maintains coherent goal-directed behavior surpassing prior models that degrade after 30-50 steps

Prompting Kimi K2 Thinking

Applications & Use Cases

Agentic Reasoning & Problem Solving:
• Expert-level reasoning across 100+ subjects achieving 44.9% on Humanity's Last Exam with tools
• PhD-level mathematical problem solving through 23+ interleaved reasoning and tool calls
• Elite competition mathematics: 99.1% AIME 2025, 95.1% HMMT 2025 with Python tools
• Dynamic hypothesis generation, evidence verification, and coherent answer construction

Agentic Search & Web Reasoning:
• State-of-the-art 60.2% BrowseComp performance, significantly outperforming 29.2% human baseline
• Continuous browsing, searching, and reasoning over hard-to-find real-world web information
• 200-300 sequential tool calls for deep research workflows without human interference
• Goal-directed web-based reasoning with adaptive hypothesis refinement
• Financial search: 47.4% FinSearchComp-T3, 87.0% Frames benchmark

Agentic Coding & Software Development:
• Production-level coding: 71.3% SWE-Bench Verified, 61.1% SWE-Bench Multilingual, 41.9% Multi-SWE-bench
• Component-heavy frontend development: fully functional HTML, React, and responsive web applications from single prompts
• Multi-step development workflows with precision tool invocation and adaptive reasoning
• Terminal automation: 47.1% Terminal-Bench with simulated tools
• Competitive programming: 83.1% LiveCodeBench v6, 48.7% OJ-Bench (C++)

Creative & Practical Writing:
• Creative writing with vivid imagery, emotional depth, and thematic resonance
• Fiction, cultural reviews, and science fiction with natural fluency and style command
• Academic and research writing with rigorous logic, thoroughness, and substantive richness
• 73.8% Longform Writing benchmark demonstrating instruction adherence and perspective breadth
• Personal and emotional responses with empathy, nuance, and actionable guidance

Long-Horizon Autonomous Workflows:
• Research automation executing hundreds of coherent reasoning steps
• Office automation and document generation workflows
• Multi-step coding projects from ideation to functional products
• Complex problem decomposition into clear, actionable subtasks
• Stable agency surpassing models that degrade after 30-50 steps

Looking for production scale? Deploy on a dedicated endpoint

Deploy Kimi K2 Thinking on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.

Get started