NVIDIA Nemotron 3: the free AI model that sees, hears and reasons at the same time

For years, AI agent systems had a structural problem: understanding the world required multiple separate models. One to read text, another to analyze images, another to process audio. Every time the agent had to pass information from one model to the next, it lost time and context. The result was slowness, high costs and errors that accumulated between handoffs.

On April 28, 2026, NVIDIA presented Nemotron 3 Nano Omni — an open-source artificial intelligence model that solves this problem at its root. Instead of chaining several models together, Nemotron 3 Nano Omni unifies text, vision, images and audio into a single reasoning system. One model that sees, hears and reads simultaneously.

What it is and how it works

Nemotron 3 Nano Omni has 30 billion total parameters but uses a mixture-of-experts (MoE) architecture that only activates around 3 billion per inference — the experts needed depending on the task type. The result is a model with the capability of a large one but the computational cost of a small one.

The architecture combines vision and audio encoders within the same system, eliminating the need for separate perception models. This allows it to reason about what appears on a screen, transcribe what it hears in a video and read complex documents — all in the same processing loop, with no intermediate steps.

On benchmarks for complex document understanding, video and audio comprehension, the model ranks first on six open model leaderboards. Against other equivalent open multimodal models, it achieves up to 9 times faster inference speed at the same response quality.

What it's actually used for

NVIDIA designed Nemotron 3 Nano Omni to function as the perception component within larger AI agent systems. It acts as the "eyes and ears" of the system while larger models like Nemotron 3 Super or Nemotron 3 Ultra handle planning and execution.

The three concrete use cases NVIDIA describes are:

NVIDIA Nemotron 3: the free AI model that sees, hears and reasons at the same time

PHOTO: illustrative image generated with AI for informational purposes.

Computer use: the model interprets what appears on a computer screen in real time — text, graphical interfaces, menus — and allows an agent to navigate software systems without API access. H Company already uses it so its agents can interpret full 1080p screen recordings, something that wasn't practical before due to latency.

Document intelligence: analyzes complex documents combining text, tables and images — legal contracts, financial reports, medical records — and answers specific questions about their content with high accuracy.

Audio and video: transcribes, summarizes and answers questions about video and audio content, capturing visual context that audio-only models miss — such as charts or on-screen text within a recording.

Who's already using it

Companies including Palantir, Foxconn, Eka Care and H Company have already adopted Nemotron 3 Nano Omni in production. Dell Technologies, DocuSign, Oracle and Infosys are in the evaluation phase. The list reflects that the model targets the enterprise market directly — healthcare, finance, manufacturing and technology.

Where to download it

Nemotron 3 Nano Omni is open source and available for free on Hugging Face and OpenRouter. It can also be used as a microservice through NVIDIA NIM at build.nvidia.com.

The model runs on NVIDIA hardware across multiple generations — from Ampere to Hopper and Blackwell GPUs — and supports FP8 and NVFP4 quantization for greater efficiency in enterprise deployments. Its lightweight architecture also allows running it locally on hardware like the NVIDIA DGX Spark or NVIDIA Jetson, without depending on the cloud.

The complete Nemotron 3 ecosystem

Nemotron 3 Nano Omni is the first released model in the Nemotron 3 family. The other two — Nemotron 3 Super (120 billion parameters, focused on collaborative agents and high-volume workloads) and Nemotron 3 Ultra (for complex planning and advanced reasoning) — are expected in the first half of 2026.

In parallel, NVIDIA announced at GTC 2026 the Nemotron Coalition: an alliance of AI labs including Mistral AI, Perplexity, LangChain, Cursor and Black Forest Labs, among others, to collaboratively develop the base model that will power the Nemotron 4 family — the next generation after the current one.