Models / Minimax AI
Reasoning
Code

MiniMax M2.7

Production-scale software engineering with long-horizon agentic execution and native Agent Teams

About model

MiniMax M2.7 is the first model to meaningfully participate in its own development. An internal version autonomously ran 100+ optimization rounds — analyzing failure trajectories, modifying code, evaluating results, and deciding to keep or revert — achieving a 30% improvement on internal programming benchmarks.

On SWE-Pro, M2.7 scores 56.22%, matching GPT-5.3-Codex, with 55.6% on VIBE-Pro (near Opus 4.6) for end-to-end project delivery across Web, Android, and iOS. On MLE Bench Lite, M2.7 achieved a 66.6% medal rate — second only to Opus-4.6 and GPT-5.4. Native agent teams enable stable multi-agent collaboration with role identity and autonomous decision-making across complex state machines, with 97% skill compliance across 40+ complex skills on Together AI's production infrastructure.

SWE-Pro

56.22%

Software engineering across multilingual, real-world codebases

MLE Bench Lite medal rate

66.6%

2nd only to Opus-4.6 and GPT-5.4 across 22 ML competitions

Autonomous optimization rounds

100+

Self-directed RL loop achieving 30% improvement on internal benchmarks

Model key capabilities
  • Software engineering: 56.22% SWE-Pro matching GPT-5.3-Codex; 76.5 SWE Multilingual; 55.6% VIBE-Pro near Opus 4.6 for end-to-end project delivery across Web, Android, and iOS
  • Model self-evolution: Autonomously ran 100+ optimization rounds achieving 30% performance improvement; 66.6% MLE Bench Lite medal rate, second only to Opus-4.6 and GPT-5.4
  • Native agent teams: Multi-agent collaboration with stable role identity and autonomous decision-making; 97% skill compliance across 40+ complex skills (each 2,000+ tokens)
  • Professional work: ELO 1495 on GDPval-AA, highest among open-source models; high-fidelity multi-round editing for Word, Excel, and PPT
  • Production-ready infrastructure: 99.9% SLA, serverless and dedicated infrastructure on the AI Native Cloud
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    MiniMaxAI/MiniMax-M2.7

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "MiniMaxAI/MiniMax-M2.7",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="MiniMaxAI/MiniMax-M2.7",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'MiniMaxAI/MiniMax-M2.7',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Training through self-optimization

    • During development, M2.7 participated in its own training: updating its own memory, building complex skills for RL experiments, and improving its learning process based on results
    • During training, an internal version autonomously optimized a programming scaffold over 100+ rounds — analyzing failure trajectories, modifying code, running evaluations, and deciding to keep or revert — achieving a 30% performance improvement
    • MLE Bench Lite (22 ML competitions): 66.6% medal rate, second only to Opus-4.6 (75.7%) and GPT-5.4 (71.2%), tying with Gemini-3.1

    Professional software engineering

    • SWE-Pro: 56.22%, matching GPT-5.3-Codex across multiple programming languages
    • VIBE-Pro: 55.6%, near Opus 4.6 — end-to-end project delivery across Web, Android, iOS, and simulation
    • SWE Multilingual: 76.5 | Multi SWE Bench: 52.7 | Terminal Bench 2: 57.0% | NL2Repo: 39.8%
    • Native agent teams with stable role identity and autonomous decision-making across complex state machines
    • System-level reasoning: Correlates monitoring metrics, conducts trace analysis, verifies root causes in databases, makes SRE-level decisions — live production incident recovery reduced to under three minutes

    Professional work

    • GDPval-AA ELO: 1495 — highest among open-source models, surpassing GPT-5.3
    • High-fidelity multi-round editing for Word, Excel, and PPT, producing editable deliverables
    • Toolathon: 46.3% accuracy, global top tier
    • MM Claw: 62.7%, close to Sonnet 4.6 | 97% skill compliance across 40+ complex skills (each exceeding 2,000 tokens)

  • Applications & use cases

    Professional software engineering:

    • SWE-Pro: 56.22%, matching GPT-5.3-Codex across multiple programming languages
    • End-to-end project delivery: 55.6% VIBE-Pro, near Opus 4.6—Web, Android, iOS, and simulation tasks
    • System-level reasoning: correlates monitoring metrics, conducts trace analysis, verifies root causes in databases, and makes SRE-level decisions
    • Real-world incident recovery reduced to under three minutes
    • Terminal Bench 2: 57.0% | SWE Multilingual: 76.5 | NL2Repo: 39.8%

    Long-horizon agentic execution:

    • Sustains progress across hundreds of rounds and thousands of tool calls
    • 66.6% medal rate on MLE Bench Lite (22 ML competitions)—second only to Opus-4.6 and GPT-5.4
    • Trained via recursive self-optimization: 100+ autonomous rounds of analyze → modify → evaluate → keep or revert during development
    • 30% improvement achieved through that self-directed training loop

    Native Agent Teams:

    • Multi-agent collaboration with stable role identity and autonomous decision-making
    • Adversarial reasoning, protocol adherence, and behavioral differentiation as native model capabilities
    • 97% skill compliance across 40+ complex skills, each exceeding 2,000 tokens
    • MM Claw: 62.7%, close to Sonnet 4.6

    Professional work:

    • GDPval-AA ELO: 1495—highest among open-source models, surpassing GPT-5.3
    • High-fidelity multi-round editing for Word, Excel, and PPT producing editable deliverables
    • Toolathon: 46.3% accuracy, global top tier
    • Financial modeling: reads annual reports, cross-references research reports, builds revenue forecast models and PPT/Word deliverables autonomously
Related models
  • Model provider
    Minimax AI
  • Type
    Reasoning
    Code
  • Main use cases
    Chat
    Coding Agents
    Function Calling
    Reasoning
  • Features
    Function Calling
    JSON Mode
    Prompt Caching
  • Speed
    Medium
  • Intelligence
    Very High
  • Deployment
    Monthly Reserved
    Serverless
  • Parameters
    229B
  • Context length
    228700
  • Input price

    $0.30 / 1M tokens

    $0.06 (cached)/1M

  • Output price

    $1.20 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    April 11, 2026
  • Quantization level
    FP4
  • Category
    Chat