MiniMax M2.5
Production-scale agentic coding with full-stack development and office deliverables
About model
MiniMax M2.5 is SOTA in coding, agentic tool use, search, and office work, extensively trained with reinforcement learning across 200,000+ complex real-world environments. The model achieves 80.2% SWE-Bench Verified while completing tasks 37% faster than M2.1, matching Claude Opus 4.6's speed. M2.5 exhibits architect-level planning capability, actively decomposing and planning features, structure, and UI design before writing code—spanning the entire development lifecycle from 0-to-1 system design through 90-to-100 comprehensive testing. Trained on 10+ programming languages across full-stack platforms (Web, Android, iOS, Windows), M2.5 delivers truly deliverable outputs in office scenarios on Together AI's production infrastructure.
80.2%
SOTA coding across 200K+ real-world environments
37%
Matching Opus 4.6 speed with efficient decomposition
200K+
RL training across coding, search, and office work
- Architect-Level Planning: Spec-writing with feature decomposition and UI design before coding—spanning 0-to-1 system design through 90-to-100 comprehensive testing
- SOTA Agentic Coding: 80.2% SWE-Bench Verified across 10+ languages and full-stack platforms—37% faster than M2.1, matching Opus 4.6 speed
- Office Deliverables: Word documents, PowerPoint presentations, Excel models trained with industry experts—59.0% win rate vs mainstream models
- Production-Ready Infrastructure: 99.9% SLA, available on serverless and dedicated infrastructure
API usage
Endpoint:
Model card
Architecture Overview:
• SOTA agentic model trained with reinforcement learning across 200,000+ complex real-world environments
• Forge agent-native RL framework with 40x training speedup through asynchronous scheduling and tree-structured sample merging
• CISPO algorithm ensuring MoE model stability during large-scale RL training
• Process reward mechanism for end-to-end generation quality monitoring in long-context agent rollouts
• Optimal trade-off between intelligence and response speed through trajectory-based task completion time evaluation
• Trained on 10+ programming languages: Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby
Training Methodology:
• Extensive RL training in hundreds of thousands of real-world coding, search, and office work environments
• Collaboration with senior professionals in finance, law, and social sciences for office deliverables training
• Industry expert-designed requirements, feedback, and standards contributing to data construction
• Architect-level planning emerged during training: spec-writing before coding with feature decomposition
• Trained for efficient reasoning and optimal task decomposition reducing token consumption by 5% vs M2.1
• Full development lifecycle training: 0-to-1 system design, 1-to-10 development, 10-to-90 iteration, 90-to-100 testing
Performance Characteristics:
• Coding Excellence: 80.2% SWE-Bench Verified, 51.3% Multi-SWE-Bench, 79.7% Droid, 76.1% OpenCode
• Agentic Leadership: 76.3% BrowseComp (with context management), 20% fewer search rounds vs M2.1
• Office Deliverables: 59.0% win rate in GDPval-MM evaluation vs mainstream models
• Speed: 37% faster than M2.1 on SWE-Bench Verified (22.8 min vs 31.3 min), matching Claude Opus 4.6
• Cost Efficiency: 10% cost of Claude Opus 4.6 per task, $1/hour continuous operation at 100 TPS
• Token Efficiency: 3.52M tokens/task vs M2.1's 3.72M, 5% reduction through better decomposition
• Additional Benchmarks: 86.3% AIME25, 85.2% GPQA-D, 70.0% IFBench, 44.4% SciCode
Applications & use cases
Full-Stack Software Development:
• Architect-level planning: Spec-writing with feature decomposition, structure design, and UI planning before coding
• Complete development lifecycle: 0-to-1 system design and environment setup through 90-to-100 comprehensive testing
• 80.2% SWE-Bench Verified, 51.3% Multi-SWE-Bench across 10+ programming languages
• Full-stack platforms: Web, Android, iOS, Windows with server-side APIs, business logic, databases
• Complex system development beyond bug-fixing: feature iteration, code review, system testing
• Multi-environment generalization: 79.7% on Droid, 76.1% on OpenCode with different scaffoldings
Agentic Search & Tool Use:
• Industry-leading performance: 76.3% BrowseComp with context management
• Expert-level search tasks: RISE benchmark evaluating real-world professional research capabilities
• Efficient decision-making: 20% fewer search rounds than M2.1 with better token efficiency
• Precise search rounds with optimal reasoning paths to results
• Stable performance across unfamiliar scaffolding environments
• Deep webpage exploration for information-dense professional tasks
Office Deliverables & Productivity:
• Word documents, PowerPoint presentations, Excel financial models as truly deliverable outputs
• Trained with senior professionals in finance, law, and social sciences
• 59.0% win rate vs mainstream models in GDPval-MM office work evaluation
• Industry-specific tacit knowledge integrated into training pipeline
• High-value workspace scenarios: financial modeling, legal documents, research reports
• Professional trajectory evaluation alongside deliverable quality assessment
Enterprise Coding Agents:
• Autonomous software development at production scale
• Multi-language, multi-platform development workflows
• Integration with Claude Code and major coding agent frameworks
• Repository-scale navigation, refactoring, and comprehensive testing
• Real-world deployment: 80% of MiniMax's newly committed code is M2.5-generated
Knowledge Work Automation:
• Automated research report generation with proper formatting
• Financial model creation following organizational standards
• Legal document preparation with industry compliance
• Presentation creation with professional design standards
• Real-world productivity: 30% of MiniMax company tasks autonomously completed by M2.5
- Model providerMinimax AI
- TypeReasoningCode
- Main use casesChatFunction Calling
- FeaturesFunction CallingJSON Mode
- SpeedMedium
- IntelligenceVery High
- DeploymentServerlessMonthly Reserved
- Endpoint
- Parameters228.7B
- Context length192K
- Input price
$0.30 / 1M tokens
$0.06 (cached)/1M
- Output price
$1.20 / 1M tokens
- Input modalitiesText
- Output modalitiesText
- ReleasedFebruary 11, 2026
- Last updatedFebruary 14, 2026
- Quantization levelFP4
- External link
- CategoryChat