NVIDIAAny-to-Any

NVIDIA Releases Nemotron-3-Nano Omni-Modal MoE

The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.

Apr 20, 2026

NotableOther

NVIDIA has introduced Nemotron-3-Nano-Omni, a powerful new model designed for sophisticated multimodal tasks. The model employs a Mixture-of-Experts (MoE) architecture, a technique that activates only a fraction of its total parameters for any given task, leading to significant computational savings.

With a total of 30 billion parameters, Nemotron-3-Nano-Omni uses just 3 billion active parameters during inference. This efficient design allows it to deliver the performance of a much larger model without the corresponding computational overhead, making advanced reasoning more accessible. The model is available with BF16 weights, a common format for balancing performance and precision.

Any-to-Any Reasoning

The model's key feature is its "omni-modal" capability, allowing it to process and generate information across different formats seamlessly. It can handle what the company calls "any-to-any" tasks, meaning it can ingest a mix of inputs—such as text, images, and video—and produce a mix of outputs in response. This flexibility is critical for complex applications that require understanding context across multiple data types.

Nemotron-3-Nano-Omni represents a notable step forward for efficient and versatile AI. While it is governed by a custom NVIDIA license rather than a permissive open-source one, its availability on the Hugging Face Hub enables researchers and developers to experiment with its unique multimodal reasoning capabilities.

Sources

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Any-to-Any Reasoning