Kimi K2.7 Code

Coding-focused agentic model with improved long-horizon task completion and thinking efficiency

Try now

read docs

About model

Kimi K2.7 Code is Moonshot AI's coding-focused agentic model built on the Kimi K2.6 architecture, delivering substantial improvements on real-world long-horizon coding tasks and reducing thinking-token usage by approximately 30% compared to K2.6. It shares the same 1T parameter (32B activated) MoE architecture with MoonViT vision encoder, 256K context window, and native INT4 quantization. Thinking and preserve_thinking are always enabled for consistent reasoning across multi-turn agentic sessions. The model scores 62.0 on Kimi Code Bench V2, 81.1% on MCP Mark Verified, and 76.0% on MCP Atlas.

Kimi Code Bench V2

Coding benchmark across 10+ languages and production tech stacks

Fewer Thinking Tokens

30%

vs Kimi K2.6 with improved long-horizon task completion

MCP Mark Verified

81.10%

Tool use across Notion, GitHub, Filesystem, Postgres, and Playwright

Model key capabilities

Long-Horizon Coding: Substantial improvements on real-world software engineering tasks across backend services, infrastructure, performance engineering, systems programming, security, and frontend development
Thinking Efficiency: Approximately 30% reduction in thinking-token usage vs Kimi K2.6, improving cost efficiency across long agentic coding sessions without sacrificing task completion quality
Agentic Tool Use: 81.1% MCP Mark Verified and 76.0% MCP Atlas across real MCP server environments including Notion, GitHub, Filesystem, Postgres, and Playwright
Multimodal Input: Text, image, and video input via MoonViT encoder with 256K context and preserve_thinking for consistent multi-turn agentic execution

API usage

cURL
Python
Typescript

Endpoint:

moonshotai/Kimi-K2.7-Code

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2.7-Code",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="moonshotai/Kimi-K2.7-Code",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'moonshotai/Kimi-K2.7-Code',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• 1T total parameter MoE with 32B parameters activated per token
• 384 experts with 8 selected per token plus 1 shared expert; Multi-head Latent Attention (MLA)
• MoonViT vision encoder (400M parameters) for image and video input
• 256K token context window
• Native INT4 quantization
• Thinking and preserve_thinking always enabled — thinking mode cannot be disabled
• Same base architecture as Kimi K2.5 and K2.6; improvements come from coding-focused post-training

Training Methodology:
• Coding-focused post-training built on Kimi K2.6
• Targets real-world long-horizon coding tasks: production incidents, open-source projects, and internal engineering use cases
• Achieves approximately 30% reduction in thinking-token usage vs K2.6

Performance Characteristics:
• Coding: 62.0 Kimi Code Bench V2 (+11.1 over K2.6), 53.6 Program Bench, 35.1 MLS-Bench Lite
• Agentic: 81.1% MCP Mark Verified (+8.3 over K2.6), 76.0% MCP Atlas (+6.6 over K2.6), 46.9 Kimi Claw 24/7 Bench
• Recommended: temperature=1.0, top_p=0.95, 256K context
‍
Prompting
Together AI API Access:
• Access Kimi K2.7 Code via Together AI APIs using the endpoint moonshotai/Kimi-K2.7-Code
• Authenticate using your Together AI API key in request headers
• Thinking and preserve_thinking are always enabled — reasoning context is retained across all turns automatically
• Recommended parameters: temperature=1.0, top_p=0.95
• Available on Together AI serverless and dedicated infrastructure
‍
Applications & use cases
Long-Horizon Software Engineering:
• End-to-end task completion across backend services, infrastructure, and performance engineering
• Production incident resolution and open-source project contributions
• Security engineering, systems programming, and ML/data engineering
• Multi-file refactors and complex debugging across large codebases

Agentic Tool Orchestration:
• MCP tool use across Notion, GitHub, Filesystem, Postgres, and Playwright environments
• Persistent multi-day coworking tasks spanning software engineering, ML research, and trading
• Autonomous long-horizon execution with consistent reasoning via preserve_thinking

Multimodal Coding Workflows:
• Image and video input for visual coding, UI generation from screenshots, and design-to-code workflows
• Frontend development from visual references with 256K context for large projects
• Cross-modal reasoning across code, images, and documentation in a single session
‍

Related models

Model specifications

Model data

Model provider
Moonshot AI
Type
Vision
Chat
Code
LLM
Main use cases
Coding Agents
Features
Function Calling
JSON Mode
Deployment
Serverless
On-Demand Dedicated
Endpoint
moonshotai/Kimi-K2.7-Code
Context length
256K
Input price
$0.95 / 1M tokens
$0.19 (cached)/1M
Output price
$4.00 / 1M tokens
Input modalities
Text
Image
Video
Output modalities
Text

Released
June 12, 2026
Last updated
June 12, 2026
Category
Chat

Run in Playground

Quickstart docs

Deploy model

Kimi K2.7 Code

About model

API usage

Model card

Prompting

Applications & use cases