Models / NVIDIANemotron Nano / / NVIDIA Nemotron 3 Nano API
NVIDIA Nemotron 3 Nano API
Most efficient agentic AI model with hybrid Mamba-Transformer MoE and 1 million-token context

This model is not currently supported on Together AI.
Visit our Models page to view all the latest models.
NVIDIA Nemotron 3 Nano API Usage
Endpoint
How to use NVIDIA Nemotron 3 Nano
Model details
Architecture Overview:
• Hybrid Mamba-Transformer mixture-of-experts (MoE) architecture with 30B total parameters and 3B active per token.
• 1 million-token context window enabling extended agent coherence and multi-document reasoning for complex operations.
• Multi-token prediction (MTP) generates multiple future tokens simultaneously in one forward pass for faster sequence generation.
• Latent MoE captures more nuanced patterns and handles diverse inputs better than traditional architectures.
• Thinking budget feature avoids overthinking and optimizes for lower, predictable inference cost.
Training Methodology:
• Multi-environment reinforcement learning (RL) training across 10 environments using open-sourced NVIDIA NeMo Gym.
• Trained on 10 trillion tokens of curated datasets covering code, scientific reasoning, math, function calling, and instruction following.
• Post-training RL achieves leading accuracy by training across diverse environments for robust agentic behavior.
• NVFP4 quantization applied to deliver high throughput and compute efficiency while maintaining accuracy.
Performance Characteristics:
• 4x higher token generation efficiency compared to Nemotron 2 via hybrid Mamba-Transformer MoE architecture.
• Leading accuracy on agentic benchmarks: SWE Bench coding, GPQA Diamond scientific reasoning, AIME 2025 math.
• Strong function calling and instruction following: IFBench, Arena Hard performance.
• Extreme long-context capabilities: RULER benchmark with 1M token context for RAG pipelines.
• Best-in-class among open models with 30B or fewer MoE parameters across targeted agentic tasks.
Prompting NVIDIA Nemotron 3 Nano
Applications & Use Cases
Agentic AI Applications:
• Building reliable, high-throughput AI agents for complex, multi-step operations and long-duration tasks.
• Multi-agent collaboration systems requiring extended conversation history and plan state retention.
• Autonomous agents for customer service, IT ticket automation, and business process orchestration.
Code & Development:
• SWE Bench workflows: code review, summarization, debugging, and generation with long context.
• Long-form code understanding and generation across multiple files and repositories.
• Software engineering agents for automated code maintenance and refactoring.
Enterprise & Compliance:
• Financial transaction compliance monitoring and cybersecurity fraud detection with IFBench.
• Automated generation of long-form business reports from many inputs (dashboards, logs, memos, meeting notes).
• Compliance report generation maintaining consistency across sections with 1M token context.
RAG & Document Processing:
• Retrieval-augmented generation pipelines processing extensive multi-document contexts.
• Cross-document reasoning and synthesis across hundreds of documents simultaneously.
• Long-form content generation with RULER benchmark performance for information coherence.
Function Calling & Integration:
• Complex function calling and API orchestration for enterprise system integration.
• Tool use and multi-step reasoning for autonomous workflow execution.
• High-accuracy instruction following for mission-critical business applications.
