MiniMax M2.7
Production-scale software engineering with long-horizon agentic execution and native Agent Teams
About model
MiniMax M2.7 is the first model to meaningfully participate in its own development. An internal version autonomously ran 100+ optimization rounds — analyzing failure trajectories, modifying code, evaluating results, and deciding to keep or revert — achieving a 30% improvement on internal programming benchmarks.
On SWE-Pro, M2.7 scores 56.22%, matching GPT-5.3-Codex, with 55.6% on VIBE-Pro (near Opus 4.6) for end-to-end project delivery across Web, Android, and iOS. On MLE Bench Lite, M2.7 achieved a 66.6% medal rate — second only to Opus-4.6 and GPT-5.4. Native agent teams enable stable multi-agent collaboration with role identity and autonomous decision-making across complex state machines, with 97% skill compliance across 40+ complex skills on Together AI's production infrastructure.
56.22%
Software engineering across multilingual, real-world codebases
66.6%
2nd only to Opus-4.6 and GPT-5.4 across 22 ML competitions
100+
Self-directed RL loop achieving 30% improvement on internal benchmarks
- Software engineering: 56.22% SWE-Pro matching GPT-5.3-Codex; 76.5 SWE Multilingual; 55.6% VIBE-Pro near Opus 4.6 for end-to-end project delivery across Web, Android, and iOS
- Model self-evolution: Autonomously ran 100+ optimization rounds achieving 30% performance improvement; 66.6% MLE Bench Lite medal rate, second only to Opus-4.6 and GPT-5.4
- Native agent teams: Multi-agent collaboration with stable role identity and autonomous decision-making; 97% skill compliance across 40+ complex skills (each 2,000+ tokens)
- Professional work: ELO 1495 on GDPval-AA, highest among open-source models; high-fidelity multi-round editing for Word, Excel, and PPT
- Production-ready infrastructure: 99.9% SLA, serverless and dedicated infrastructure on the AI Native Cloud
API usage
Endpoint:
Model card
Training through self-optimization
- During development, M2.7 participated in its own training: updating its own memory, building complex skills for RL experiments, and improving its learning process based on results
- During training, an internal version autonomously optimized a programming scaffold over 100+ rounds — analyzing failure trajectories, modifying code, running evaluations, and deciding to keep or revert — achieving a 30% performance improvement
- MLE Bench Lite (22 ML competitions): 66.6% medal rate, second only to Opus-4.6 (75.7%) and GPT-5.4 (71.2%), tying with Gemini-3.1
Professional software engineering
- SWE-Pro: 56.22%, matching GPT-5.3-Codex across multiple programming languages
- VIBE-Pro: 55.6%, near Opus 4.6 — end-to-end project delivery across Web, Android, iOS, and simulation
- SWE Multilingual: 76.5 | Multi SWE Bench: 52.7 | Terminal Bench 2: 57.0% | NL2Repo: 39.8%
- Native agent teams with stable role identity and autonomous decision-making across complex state machines
- System-level reasoning: Correlates monitoring metrics, conducts trace analysis, verifies root causes in databases, makes SRE-level decisions — live production incident recovery reduced to under three minutes
Professional work
- GDPval-AA ELO: 1495 — highest among open-source models, surpassing GPT-5.3
- High-fidelity multi-round editing for Word, Excel, and PPT, producing editable deliverables
- Toolathon: 46.3% accuracy, global top tier
- MM Claw: 62.7%, close to Sonnet 4.6 | 97% skill compliance across 40+ complex skills (each exceeding 2,000 tokens)
Applications & use cases
Professional software engineering:
- SWE-Pro: 56.22%, matching GPT-5.3-Codex across multiple programming languages
- End-to-end project delivery: 55.6% VIBE-Pro, near Opus 4.6—Web, Android, iOS, and simulation tasks
- System-level reasoning: correlates monitoring metrics, conducts trace analysis, verifies root causes in databases, and makes SRE-level decisions
- Real-world incident recovery reduced to under three minutes
- Terminal Bench 2: 57.0% | SWE Multilingual: 76.5 | NL2Repo: 39.8%
Long-horizon agentic execution:
- Sustains progress across hundreds of rounds and thousands of tool calls
- 66.6% medal rate on MLE Bench Lite (22 ML competitions)—second only to Opus-4.6 and GPT-5.4
- Trained via recursive self-optimization: 100+ autonomous rounds of analyze → modify → evaluate → keep or revert during development
- 30% improvement achieved through that self-directed training loop
Native Agent Teams:
- Multi-agent collaboration with stable role identity and autonomous decision-making
- Adversarial reasoning, protocol adherence, and behavioral differentiation as native model capabilities
- 97% skill compliance across 40+ complex skills, each exceeding 2,000 tokens
- MM Claw: 62.7%, close to Sonnet 4.6
Professional work:
- GDPval-AA ELO: 1495—highest among open-source models, surpassing GPT-5.3
- High-fidelity multi-round editing for Word, Excel, and PPT producing editable deliverables
- Toolathon: 46.3% accuracy, global top tier
- Financial modeling: reads annual reports, cross-references research reports, builds revenue forecast models and PPT/Word deliverables autonomously
- Model providerMinimax AI
- TypeReasoningCode
- Main use casesChatCoding AgentsFunction CallingReasoning
- FeaturesFunction CallingJSON ModePrompt Caching
- SpeedMedium
- IntelligenceVery High
- DeploymentMonthly ReservedServerless
- Endpoint
- Parameters229B
- Context length228700
- Input price
$0.30 / 1M tokens
$0.06 (cached)/1M
- Output price
$1.20 / 1M tokens
- Input modalitiesText
- Output modalitiesText
- ReleasedApril 11, 2026
- Quantization levelFP4
- CategoryChat