NVIDIAImage → Video

NVIDIA Releases Cosmos3 Image-to-Video World Model

The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.

May 21, 2026

NotableOther

NVIDIA has quietly released a new generative video model, Cosmos3 Super Image2Video, making the weights available for download on Hugging Face. The model is designed for the increasingly popular task of image-to-video generation, creating short video clips from a single static input image.

Unlike many other animation models, NVIDIA frames Cosmos3 as part of its "world-model" research family. This framing suggests an ambitious goal beyond simple motion generation. World models aim to create an internal representation of the rules and dynamics of an environment, which could lead to more physically plausible and temporally coherent video outputs. The goal is to animate a scene by simulating its evolution rather than just interpolating pixels.

The model is available now for researchers and developers to explore. However, it is released under a custom license that is not open-source and comes with specific use restrictions. Potential users should review the terms carefully before integrating it into their projects.

This release underscores NVIDIA's deep investment in foundational AI research, extending far beyond its role as a hardware provider. As generative video technology advances, models that can demonstrate a more robust understanding of the real world—its physics, object interactions, and causal relationships—will be critical. Cosmos3 represents another step by a major industry player toward that more sophisticated future.

Sources

nvidia/Cosmos3-Super-Image2Video
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026