NVIDIA Nemotron 3 Nano
Most efficient agentic AI model with hybrid Mamba-Transformer MoE and 1 million-token context
About model
1M
Million-token context for long-duration agents
4x
Faster than Nemotron 2 with MTP
3B
From 30B total MoE architecture
- Leading Agentic Benchmarks: SWE Bench, GPQA Diamond, AIME 2025, HLE, IFBench, RULER
- Hybrid Architecture: Mamba-Transformer MoE with multi-token prediction (MTP)
- Extreme Context: 1 million tokens for extended agent coherence and cross-document reasoning
- Fully Open Source: Model weights, 10T token datasets, recipes, and NeMo Gym tools
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
NVIDIA Nemotron 3 Nano | 89.1% | 73.0% | 10.6% | 68.3% | Related open-source models | Competitor closed-source models |
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
Model card
Architecture Overview:
• Hybrid Mamba-Transformer mixture-of-experts (MoE) architecture with 30B total parameters and 3B active per token.
• 1 million-token context window enabling extended agent coherence and multi-document reasoning for complex operations.
• Multi-token prediction (MTP) generates multiple future tokens simultaneously in one forward pass for faster sequence generation.
• Latent MoE captures more nuanced patterns and handles diverse inputs better than traditional architectures.
• Thinking budget feature avoids overthinking and optimizes for lower, predictable inference cost.
Training Methodology:
• Multi-environment reinforcement learning (RL) training across 10 environments using open-sourced NVIDIA NeMo Gym.
• Trained on 10 trillion tokens of curated datasets covering code, scientific reasoning, math, function calling, and instruction following.
• Post-training RL achieves leading accuracy by training across diverse environments for robust agentic behavior.
• NVFP4 quantization applied to deliver high throughput and compute efficiency while maintaining accuracy.
Performance Characteristics:
• 4x higher token generation efficiency compared to Nemotron 2 via hybrid Mamba-Transformer MoE architecture.
• Leading accuracy on agentic benchmarks: SWE Bench coding, GPQA Diamond scientific reasoning, AIME 2025 math.
• Strong function calling and instruction following: IFBench, Arena Hard performance.
• Extreme long-context capabilities: RULER benchmark with 1M token context for RAG pipelines.
• Best-in-class among open models with 30B or fewer MoE parameters across targeted agentic tasks.
Applications & use cases
Agentic AI Applications:
• Building reliable, high-throughput AI agents for complex, multi-step operations and long-duration tasks.
• Multi-agent collaboration systems requiring extended conversation history and plan state retention.
• Autonomous agents for customer service, IT ticket automation, and business process orchestration.
Code & Development:
• SWE Bench workflows: code review, summarization, debugging, and generation with long context.
• Long-form code understanding and generation across multiple files and repositories.
• Software engineering agents for automated code maintenance and refactoring.
Enterprise & Compliance:
• Financial transaction compliance monitoring and cybersecurity fraud detection with IFBench.
• Automated generation of long-form business reports from many inputs (dashboards, logs, memos, meeting notes).
• Compliance report generation maintaining consistency across sections with 1M token context.
RAG & Document Processing:
• Retrieval-augmented generation pipelines processing extensive multi-document contexts.
• Cross-document reasoning and synthesis across hundreds of documents simultaneously.
• Long-form content generation with RULER benchmark performance for information coherence.
Function Calling & Integration:
• Complex function calling and API orchestration for enterprise system integration.
• Tool use and multi-step reasoning for autonomous workflow execution.
• High-accuracy instruction following for mission-critical business applications.
- Model providerNVIDIA
- TypeChat
- Main use casesChatSmall & FastMedium General Purpose
- DeploymentOn-Demand DedicatedMonthly Reserved
- Parameters31.6B
- Context length1M
- Input modalitiesText
- Output modalitiesText
- ReleasedDecember 3, 2025
- External link
- CategoryChat