Models / NVIDIA
Chat

NVIDIA Nemotron 3 Nano

Most efficient agentic AI model with hybrid Mamba-Transformer MoE and 1 million-token context

About model

NVIDIA Nemotron 3 Nano is the most efficient, accuracy-leading open model for building agentic AI applications at production scale. Using a hybrid Mamba-Transformer mixture-of-experts architecture with 1 million-token context length, it enables developers to deploy reliable, high-throughput agents for complex, multi-document, and long-duration operations with 4x higher token generation efficiency compared to Nemotron 2.
Context Length

1M

Million-token context for long-duration agents

Token Generation

4x

Faster than Nemotron 2 with MTP

Active Parameters

3B

From 30B total MoE architecture

Model key capabilities
  • Leading Agentic Benchmarks: SWE Bench, GPQA Diamond, AIME 2025, HLE, IFBench, RULER
  • Hybrid Architecture: Mamba-Transformer MoE with multi-token prediction (MTP)
  • Extreme Context: 1 million tokens for extended agent coherence and cross-document reasoning
  • Fully Open Source: Model weights, 10T token datasets, recipes, and NeMo Gym tools
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

89.1%

73.0%

10.6%

68.3%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • Model card

    Architecture Overview:
    • Hybrid Mamba-Transformer mixture-of-experts (MoE) architecture with 30B total parameters and 3B active per token.
    • 1 million-token context window enabling extended agent coherence and multi-document reasoning for complex operations.
    • Multi-token prediction (MTP) generates multiple future tokens simultaneously in one forward pass for faster sequence generation.
    • Latent MoE captures more nuanced patterns and handles diverse inputs better than traditional architectures.
    • Thinking budget feature avoids overthinking and optimizes for lower, predictable inference cost.

    Training Methodology:
    • Multi-environment reinforcement learning (RL) training across 10 environments using open-sourced NVIDIA NeMo Gym.
    • Trained on 10 trillion tokens of curated datasets covering code, scientific reasoning, math, function calling, and instruction following.
    • Post-training RL achieves leading accuracy by training across diverse environments for robust agentic behavior.
    • NVFP4 quantization applied to deliver high throughput and compute efficiency while maintaining accuracy.

    Performance Characteristics:
    • 4x higher token generation efficiency compared to Nemotron 2 via hybrid Mamba-Transformer MoE architecture.
    • Leading accuracy on agentic benchmarks: SWE Bench coding, GPQA Diamond scientific reasoning, AIME 2025 math.
    • Strong function calling and instruction following: IFBench, Arena Hard performance.
    • Extreme long-context capabilities: RULER benchmark with 1M token context for RAG pipelines.
    • Best-in-class among open models with 30B or fewer MoE parameters across targeted agentic tasks.

  • Applications & use cases

    Agentic AI Applications:
    • Building reliable, high-throughput AI agents for complex, multi-step operations and long-duration tasks.
    • Multi-agent collaboration systems requiring extended conversation history and plan state retention.
    • Autonomous agents for customer service, IT ticket automation, and business process orchestration.

    Code & Development:
    • SWE Bench workflows: code review, summarization, debugging, and generation with long context.
    • Long-form code understanding and generation across multiple files and repositories.
    • Software engineering agents for automated code maintenance and refactoring.

    Enterprise & Compliance:
    • Financial transaction compliance monitoring and cybersecurity fraud detection with IFBench.
    • Automated generation of long-form business reports from many inputs (dashboards, logs, memos, meeting notes).
    • Compliance report generation maintaining consistency across sections with 1M token context.

    RAG & Document Processing:
    • Retrieval-augmented generation pipelines processing extensive multi-document contexts.
    • Cross-document reasoning and synthesis across hundreds of documents simultaneously.
    • Long-form content generation with RULER benchmark performance for information coherence.

    Function Calling & Integration:
    • Complex function calling and API orchestration for enterprise system integration.
    • Tool use and multi-step reasoning for autonomous workflow execution.
    • High-accuracy instruction following for mission-critical business applications.

Related models
  • Model provider
    NVIDIA
  • Type
    Chat
  • Main use cases
    Chat
    Small & Fast
    Medium General Purpose
  • Deployment
    On-Demand Dedicated
    Monthly Reserved
  • Parameters
    31.6B
  • Context length
    1M
  • Input modalities
    Text
  • Output modalities
    Text