NVIDIA Releases Efficient Nemotron-3 Multimodal MoE
The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.
NVIDIA has released Nemotron-3 Nano Omni, a new multimodal model designed for complex reasoning tasks. The model continues the company's expansion into open AI, offering a sophisticated tool capable of understanding and processing both text and visual information.
The key innovation in Nemotron-3 Nano Omni is its efficient architecture. It's a Mixture-of-Experts (MoE) model with a total of 30 billion parameters, but it only activates a fraction of them—just 3 billion—for any given task. This design significantly reduces the computational cost of inference compared to a traditional dense model of the same size, a focus highlighted by the "Nano" in its name.
As a Vision-Language Model (VLM), Nemotron-3 Nano Omni can perform tasks that require a simultaneous understanding of images and language. This makes it suitable for applications like detailed image captioning, visual question answering, and other reasoning challenges that depend on integrating visual context with textual prompts.
Developers can access the model, optimized in the NVFP4 format, on the Hugging Face Hub. It is available under the Nvidia Open Model License, a custom license that users should review before integrating it into their projects. This release provides a powerful, resource-conscious option for teams building multimodal AI applications.
Sources
- Visit
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.