Company

NVIDIA

20 modelsUS

CategoriesAny-to-Any Embeddings Text → Video Text → Image Text / LLM Image → Video Speech → Text Image Editing Vision-Language

Releases

NVIDIA/Any-to-Any

NVIDIA's Audio-Visual Flamingo Fuses Sound and Sight

A fully open multimodal model aims to reason jointly across audio, images, and long-form video.

Jul 16, 2026

Any-to-Any Vision-Language

NVIDIA/Embeddings

NVIDIA's Nemotron-3-Embed 8B tops RTEB retrieval test

The 8-billion-parameter text embedding model claims the number one overall spot on the RTEB benchmark, with an eye toward agentic retrieval.

Jul 16, 2026

Embeddings

NVIDIA/Embeddings

NVIDIA's Nemotron 3 Embed tops the RTEB leaderboard

A compact 1B-parameter text embedding model claims the top overall spot on a retrieval benchmark aimed at reflecting real-world use.

Jul 14, 2026

Embeddings

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Any-to-Any Text → Speech

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

Text → Video Image → Video

NVIDIA/Text → Image

NVIDIA distills Qwen-Image for few-step generation

A DMD2-distilled build of Qwen-Image trades sampling steps for speed while keeping the original model's output profile.

Jul 1, 2026

Text → Image

NVIDIA/Text / LLM

Liquid AI's LFM2.5-230M targets phones and robots

A 230-million-parameter language model built to run on hardware as modest as a Raspberry Pi.

Jul 1, 2026

Text / LLM

NVIDIA/Text / LLM

NVIDIA's Nemotron 3 Puzzle Runs Big on a Lean Budget

A 75-billion-parameter mixture-of-experts reasoning model that activates just 9 billion parameters per token.

Jun 24, 2026

Text / LLM Reasoning

NVIDIA/Text / LLM

NVIDIA's Nemotron-3 Puzzle Brings a Lean MoE to Reasoning

The 75B-parameter model activates just 9B per token and ships in NVIDIA's NVFP4 format for efficient inference.

Jun 24, 2026

Text / LLM Reasoning

NVIDIA/Image → Video

NVIDIA Releases Cosmos3 Image-to-Video World Model

The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.

May 21, 2026

Image → Video

NVIDIA/Image → Video

NVIDIA Releases SANA, a Camera-Controllable Video Model

The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.

May 18, 2026

Image → Video Text → Video

NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026

Speech → Text

NVIDIA/Image Editing

NVIDIA Releases PiD for High-Quality Image Upscaling

The new component is a specialized VAE decoder that works with Stability AI's Z-Image model to enhance super-resolution tasks.

Apr 28, 2026

Image Editing

NVIDIA/Any-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026

Any-to-Any Reasoning

NVIDIA/Any-to-Any

NVIDIA Releases Nemotron-3-Nano Omni-Modal MoE

The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.

Apr 20, 2026

Any-to-Any Reasoning

NVIDIA/Text / LLM

NVIDIA's Nemotron TwoTower is a MoE experiment

An experimental 30B mixture-of-experts base model blends diffusion and Mamba ideas under a two-tower design.

Apr 11, 2026

Text / LLM

NVIDIA/Text / LLM

NVIDIA's Nemotron TwoTower mixes diffusion and Mamba

A new 30B mixture-of-experts base model activates just 3B parameters per token and pairs a hybrid diffusion/Mamba design.

Apr 11, 2026

Text / LLM

NVIDIA/Vision-Language

NVIDIA's New 3B VLM Pinpoints Objects in Images

The new 3-billion-parameter model, based on the company's Eagle architecture, is designed for high-precision visual grounding tasks.

Mar 2, 2026

Vision-Language

NVIDIA/Speech → Text

NVIDIA Releases Streaming Speech-to-Text Model

The 600-million-parameter Nemotron model is designed for real-time English transcription using a cache-aware FastConformer architecture.

Dec 17, 2025

Speech → Text

NVIDIA/Speech → Text

NVIDIA Releases Real-Time Speaker Diarization Model

The new Sortformer-based model is designed for streaming audio, identifying up to four distinct speakers in real time.

Oct 22, 2025

Speech → Text

NVIDIA/Speech → Text

NVIDIA's Parakeet ASR Tackles Multi-Speaker Audio

The 600-million-parameter model offers real-time speech-to-text with speaker diarization, built on the efficient FastConformer architecture.

Oct 15, 2025

Speech → Text

NVIDIA/Speech → Text

NVIDIA Releases Canary 1B v2 Multilingual Speech Model

The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.

Aug 4, 2025

Speech → Text

NVIDIA/Speech → Text

NVIDIA Releases 600M Parakeet for Speech Recognition

The new FastConformer model uses a specialized training technique to improve transcription accuracy in noisy, real-world environments.

Aug 4, 2025

Speech → Text

NVIDIA/Speech → Text

NVIDIA Fuses LLM and ASR in Canary-Qwen 2.5B Model

The 2.5 billion-parameter speech model combines a FastConformer encoder with a Qwen LLM decoder, a hybrid approach to transcription.

Jun 26, 2025

Speech → Text