NVIDIAAny-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026

NotableOther

NVIDIA has released Nemotron-3 Nano Omni, a new multimodal model designed for complex reasoning tasks. The model continues the company's expansion into open AI, offering a sophisticated tool capable of understanding and processing both text and visual information.

The key innovation in Nemotron-3 Nano Omni is its efficient architecture. It's a Mixture-of-Experts (MoE) model with a total of 30 billion parameters, but it only activates a fraction of them—just 3 billion—for any given task. This design significantly reduces the computational cost of inference compared to a traditional dense model of the same size, a focus highlighted by the "Nano" in its name.

As a Vision-Language Model (VLM), Nemotron-3 Nano Omni can perform tasks that require a simultaneous understanding of images and language. This makes it suitable for applications like detailed image captioning, visual question answering, and other reasoning challenges that depend on integrating visual context with textual prompts.

Developers can access the model, optimized in the NVFP4 format, on the Hugging Face Hub. It is available under the Nvidia Open Model License, a custom license that users should review before integrating it into their projects. This release provides a powerful, resource-conscious option for teams building multimodal AI applications.

Sources

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

NVIDIAAny-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026

NotableOther

Sources

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026