Model Library

Published 4/28/2026

Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0

NVIDIA Nemotron™ 3 Nano Omni is now available on the Together AI platform. Representing a meaningful step forward for multimodal AI, Nemotron 3 Nano Omni is a single, open model that reasons across video, images, audio and language. For developers building agentic applications, Together AI Dedicated Inference is the fastest way to get started and scale.

Why run it on Together AI

Together AI, the AI Native Cloud, has been the platform of choice for developers who want fast, affordable, reliable access to the world's best open models for production-scale inference. 

Nemotron 3 Nano Omni unifies context across modalities by executing chained actions often critical for agents that need deterministic behavior. That means an agent can simultaneously reason across audio inputs (e.g., recordings or transcripts), visual inputs such as screenshots, video, and structured documents— without fragmenting that understanding across separate inference passes. 

1. Together AI's Research Optimizations Unlock the Model's Full Architectural Potential

The Nemotron 3 Nano Omni hybrid Mamba-Transformer mixture-of-experts (MoE) architecture activates only ~3B parameters per token out of 30B total, and uses multi-token prediction (MTP) to generate multiple future tokens simultaneously in a single forward pass. Together AI's stack is powered by frontier AI systems research, which enables high throughput, cost-efficient, production-grade inference with consistently low latency. Pairing that with the new, highly efficient, highly accurate Nemotron 3 Nano Omni model, means faster multimodal reasoning with more intelligence per unit of compute. 

2. Managed Infrastructure Built for Agentic, Production-Scale Inference Workloads

Agent applications depend on predictable performance. Together AI delivers reliable performance under traffic spikes, high uptime, and token streaming, helping agent loops remain responsive even during long-context or continuous decision-making tasks. Developers can quickly deploy Nemotron 3 Nano Omni on Together AI without managing infrastructure, and scale seamlessly from prototype to production. This fully managed environment eliminates operational overhead so teams can focus on building, not maintaining GPUs.

3. Secure, Production-Ready Platform That Protects Your Data

Together AI offers simple, developer-friendly APIs — making it easy to integrate Nemotron 3 Nano Omni into multi-agent frameworks, planning systems, and multi-modal systems. Combined with stable and secure APIs, the platform gives organizations a trustworthy foundation to deploy AI at scale — without trading speed for safety.

Where Nemotron 3 Nano Omni excels

Most production AI systems today handle multimodal inputs through fragmented pipelines: one model for vision, another for audio, another for documents—stitched together with custom orchestration logic. Every seam in that architecture is a potential failure point — added latency, misaligned context, and compounding errors across modalities.

Nemotron Nano Omni eliminates those seams. 

Built on a Mixture of Experts (MoE) architecture with Hybrid Transformer-Mamba design, the 30B A3B model supports up to 256K tokens of shared multimodal input context in a single coherent reasoning loop. This allows an agent to understand audio inputs (e.g., transcripts), visual inputs such as screenshots, and relevant documents—without fragmenting that understanding across separate inference passes.

The efficiency benefits are significant:

  • Reduced need for multi-model pipelines, lowering system complexity
  • More efficient multimodal processing across video, audio, and document workloads
  • Improved throughput and scalability for long-context, agentic applications
  • Flexible deployment with support for FP8 and NVFP4 across NVIDIA Hopper, NVIDIA Blackwell, and more

And it's fully open. Open weights, open data, open post-training recipes. Developers can deploy anywhere — cloud, on-prem, air-gapped — with full data control and no model lock-in.

What you can build

The unification of perception and reasoning in one model opens up use cases that were previously too complex or too costly to productionize:

Customer service agents can reason across call recordings, screen recordings, and policy documents simultaneously — understanding both user intent and system context.

Financial analyst agents reason across earnings call audio, investor presentation video, scanned chart images, and SEC filings — producing grounded insights rather than surface-level summaries.

Computer use agents see a UI through screen recordings, interpret instructions, and validate actions against constraint documents — all within a single reasoning context.

Any application that previously required assembling a multi-model stack now has a cleaner path to production.

Get started

NVIDIA Nemotron 3 Nano Omni is available on Together AI today.

8S
DeepSeek R1
Premium cinematic video generation with native audio and lifelike physics.
$2.40
Try now
DeepSeek R1
8S

Audio Name

Audio Description

0:00
Premium cinematic video generation with native audio and lifelike physics.
$2.40
Try now
8S
DeepSeek R1
Premium cinematic video generation with native audio and lifelike physics.
$2.40/video (720p/8s)
Try now

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Infrastructure

Best for

  • Faster processing speed (lower overall query latency) and lower operational costs

  • Execution of clearly defined, straightforward tasks

  • Function calling, JSON mode or other well structured tasks

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Multilinguality

Word limit

Disclaimer

JSON formatting

Uppercase only

Remove commas

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:

Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:

Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.

  • A. 2.08*1e-1 m
  • B. 2.08*1e-9 m
  • C. 2.08*1e-6 m
  • D. 2.08*1e-3 m

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:

Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by

  • A. releasing nitrogen in the soil.
  • B. crowding out non-native species.
  • C. adding carbon dioxide to the atmosphere.
  • D. removing water from the soil and returning it to the atmosphere.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:

Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:

Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?

XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet
8S
DeepSeek R1
Premium cinematic video generation with native audio and lifelike physics.
$2.40
Try now
DeepSeek R1
8S

Audio Name

Audio Description

0:00
Premium cinematic video generation with native audio and lifelike physics.
$2.40
Try now
8S
DeepSeek R1
Premium cinematic video generation with native audio and lifelike physics.
$2.40/video (720p/8s)
Try now

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Infrastructure

Best for

  • Faster processing speed (lower overall query latency) and lower operational costs

  • Execution of clearly defined, straightforward tasks

  • Function calling, JSON mode or other well structured tasks

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Multilinguality

Word limit

Disclaimer

JSON formatting

Uppercase only

Remove commas

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:

Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:

Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.

  • A. 2.08*1e-1 m
  • B. 2.08*1e-9 m
  • C. 2.08*1e-6 m
  • D. 2.08*1e-3 m

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:

Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by

  • A. releasing nitrogen in the soil.
  • B. crowding out non-native species.
  • C. adding carbon dioxide to the atmosphere.
  • D. removing water from the soil and returning it to the atmosphere.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:

Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:

Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?

XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet