NVIDIA Nemotron 3 Nano

Most efficient agentic AI model with hybrid Mamba-Transformer MoE and 1 million-token context

About model

NVIDIA Nemotron 3 Nano is the most efficient, accuracy-leading open model for building agentic AI applications at production scale. Using a hybrid Mamba-Transformer mixture-of-experts architecture with 1 million-token context length, it enables developers to deploy reliable, high-throughput agents for complex, multi-document, and long-duration operations with 4x higher token generation efficiency compared to Nemotron 2.

Context Length

Million-token context for long-duration agents

Token Generation

Faster than Nemotron 2 with MTP

Active Parameters

From 30B total MoE architecture

Model key capabilities

Leading Agentic Benchmarks: SWE Bench, GPQA Diamond, AIME 2025, HLE, IFBench, RULER
Hybrid Architecture: Mamba-Transformer MoE with multi-token prediction (MTP)
Extreme Context: 1 million tokens for extended agent coherence and cross-document reasoning
Fully Open Source: Model weights, 10T token datasets, recipes, and NeMo Gym tools

Quickstart guides

RAG

Building a RAG Workflow

Agents

Agent Workflows

Apps

Next.js Chat Quickstart

Performance benchmarks

Model	AIME 2025	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
NVIDIA Nemotron 3 Nano	89.1%	73.0%	10.6%	68.3%
Related open-source models
Competitor closed-source models
Claude Opus 4.6		90.5%	34.2%			78.7%
OpenAI o3		83.3%	24.9%		99.2%	62.3%
OpenAI o1		76.8%			96.4%	48.9%
GPT-4o		49.2%	2.7%	32.3%	89.3%	31.0%

Model card
Architecture Overview:
• Hybrid Mamba-Transformer mixture-of-experts (MoE) architecture with 30B total parameters and 3B active per token.
• 1 million-token context window enabling extended agent coherence and multi-document reasoning for complex operations.
• Multi-token prediction (MTP) generates multiple future tokens simultaneously in one forward pass for faster sequence generation.
• Latent MoE captures more nuanced patterns and handles diverse inputs better than traditional architectures.
• Thinking budget feature avoids overthinking and optimizes for lower, predictable inference cost.

Training Methodology:
• Multi-environment reinforcement learning (RL) training across 10 environments using open-sourced NVIDIA NeMo Gym.
• Trained on 10 trillion tokens of curated datasets covering code, scientific reasoning, math, function calling, and instruction following.
• Post-training RL achieves leading accuracy by training across diverse environments for robust agentic behavior.
• NVFP4 quantization applied to deliver high throughput and compute efficiency while maintaining accuracy.

Performance Characteristics:
• 4x higher token generation efficiency compared to Nemotron 2 via hybrid Mamba-Transformer MoE architecture.
• Leading accuracy on agentic benchmarks: SWE Bench coding, GPQA Diamond scientific reasoning, AIME 2025 math.
• Strong function calling and instruction following: IFBench, Arena Hard performance.
• Extreme long-context capabilities: RULER benchmark with 1M token context for RAG pipelines.
• Best-in-class among open models with 30B or fewer MoE parameters across targeted agentic tasks.
‍
Applications & use cases
Agentic AI Applications:
• Building reliable, high-throughput AI agents for complex, multi-step operations and long-duration tasks.
• Multi-agent collaboration systems requiring extended conversation history and plan state retention.
• Autonomous agents for customer service, IT ticket automation, and business process orchestration.

Code & Development:
• SWE Bench workflows: code review, summarization, debugging, and generation with long context.
• Long-form code understanding and generation across multiple files and repositories.
• Software engineering agents for automated code maintenance and refactoring.

Enterprise & Compliance:
• Financial transaction compliance monitoring and cybersecurity fraud detection with IFBench.
• Automated generation of long-form business reports from many inputs (dashboards, logs, memos, meeting notes).
• Compliance report generation maintaining consistency across sections with 1M token context.

RAG & Document Processing:
• Retrieval-augmented generation pipelines processing extensive multi-document contexts.
• Cross-document reasoning and synthesis across hundreds of documents simultaneously.
• Long-form content generation with RULER benchmark performance for information coherence.

Function Calling & Integration:
• Complex function calling and API orchestration for enterprise system integration.
• Tool use and multi-step reasoning for autonomous workflow execution.
• High-accuracy instruction following for mission-critical business applications.
‍

Related models

Model specifications

Model data

Model provider
NVIDIA
Type
Chat
Main use cases
Chat
Small & Fast
Medium General Purpose
Deployment
On-Demand Dedicated
Monthly Reserved
Parameters
31.6B
Context length
1M
Input modalities
Text
Output modalities
Text

Released
December 3, 2025
External link
Provider docs
Category
Chat

Quickstart docs

Deploy model

NVIDIA Nemotron 3 Nano

About model

Model card

Applications & use cases