The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

Company

NVIDIA

14 modelsUS

Releases

NVIDIA/Image → Video

NVIDIA Releases Cosmos3 Image-to-Video World Model

The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.

May 21, 2026
Image → Video
Cosmos3 Super Image2Video
Cosmos3 Super Image2Video
NVIDIA/Image → Video

NVIDIA Releases SANA, a Camera-Controllable Video Model

The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.

May 18, 2026
Image → VideoText → Video
SANA-WM Bidirectional
SANA-WM Bidirectional
NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Speech → Text
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
NVIDIA/Image Editing

NVIDIA Releases PiD for High-Quality Image Upscaling

The new component is a specialized VAE decoder that works with Stability AI's Z-Image model to enhance super-resolution tasks.

Apr 28, 2026
Image Editing
NVIDIA PiD (Pixel Diffusion Decoder)
NVIDIA PiD (Pixel Diffusion Decoder)
NVIDIA/Any-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026
Any-to-AnyReasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
NVIDIA/Any-to-Any

NVIDIA Releases Nemotron-3-Nano Omni-Modal MoE

The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.

Apr 20, 2026
Any-to-AnyReasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
NVIDIA/Vision-Language

NVIDIA's New 3B VLM Pinpoints Objects in Images

The new 3-billion-parameter model, based on the company's Eagle architecture, is designed for high-precision visual grounding tasks.

Mar 2, 2026
Vision-Language
LocateAnything-3B
LocateAnything-3B
NVIDIA/Speech → Text

NVIDIA Releases Streaming Speech-to-Text Model

The 600-million-parameter Nemotron model is designed for real-time English transcription using a cache-aware FastConformer architecture.

Dec 17, 2025
Speech → Text
Nemotron Speech Streaming EN 0.6B
Nemotron Speech Streaming EN 0.6B
NVIDIA/Text → VideoMajor release

Tencent Releases HunyuanVideo 1.5 Generation Model

The new diffusion model generates short video clips from text and image prompts, adding another major player to the open video space.

Nov 18, 2025
Text → VideoImage → Video
HunyuanVideo 1.5
HunyuanVideo 1.5
NVIDIA/Vision-Language

Baidu Releases Open Vision-Language MoE Model

The new ERNIE 4.5 VL model brings advanced multimodal reasoning to the open-source community with an efficient Mixture-of-Experts architecture.

Nov 7, 2025
Vision-LanguageReasoning
ERNIE 4.5 VL 28B A3B Thinking
ERNIE 4.5 VL 28B A3B Thinking
NVIDIA/Speech → Text

NVIDIA Releases Real-Time Speaker Diarization Model

The new Sortformer-based model is designed for streaming audio, identifying up to four distinct speakers in real time.

Oct 22, 2025
Speech → Text
Streaming Sortformer Diarization 4spk v2.1
Streaming Sortformer Diarization 4spk v2.1
NVIDIA/Speech → Text

NVIDIA's Parakeet ASR Tackles Multi-Speaker Audio

The 600-million-parameter model offers real-time speech-to-text with speaker diarization, built on the efficient FastConformer architecture.

Oct 15, 2025
Speech → Text
Multitalker Parakeet Streaming 0.6B
Multitalker Parakeet Streaming 0.6B
NVIDIA/Speech → Text

NVIDIA Releases Canary 1B v2 Multilingual Speech Model

The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.

Aug 4, 2025
Speech → Text
Canary 1B v2
Canary 1B v2
NVIDIA/Speech → Text

NVIDIA Releases 600M Parakeet for Speech Recognition

The new FastConformer model uses a specialized training technique to improve transcription accuracy in noisy, real-world environments.

Aug 4, 2025
Speech → Text
Parakeet TDT 0.6B v3
Parakeet TDT 0.6B v3
NVIDIA/Speech → Text

NVIDIA Fuses LLM and ASR in Canary-Qwen 2.5B Model

The 2.5 billion-parameter speech model combines a FastConformer encoder with a Qwen LLM decoder, a hybrid approach to transcription.

Jun 26, 2025
Speech → Text
Canary-Qwen 2.5B
Canary-Qwen 2.5B