NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.

NVIDIA has quietly released a new generative video model, Cosmos3 Super Image2Video, making the weights available for download on Hugging Face. The model is designed for the increasingly popular task of image-to-video generation, creating short video clips from a single static input image.
Unlike many other animation models, NVIDIA frames Cosmos3 as part of its "world-model" research family. This framing suggests an ambitious goal beyond simple motion generation. World models aim to create an internal representation of the rules and dynamics of an environment, which could lead to more physically plausible and temporally coherent video outputs. The goal is to animate a scene by simulating its evolution rather than just interpolating pixels.
The model is available now for researchers and developers to explore. However, it is released under a custom license that is not open-source and comes with specific use restrictions. Potential users should review the terms carefully before integrating it into their projects.
This release underscores NVIDIA's deep investment in foundational AI research, extending far beyond its role as a hardware provider. As generative video technology advances, models that can demonstrate a more robust understanding of the real world—its physics, object interactions, and causal relationships—will be critical. Cosmos3 represents another step by a major industry player toward that more sophisticated future.
Sources
- Visit
nvidia/Cosmos3-Super-Image2Video
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.
Lightricks Releases LoRA for AI Lip-Dubbing
The new 'Identity-Control' adapter fine-tunes the company's LTX-2.3 video model to create realistic lip-syncing for dubbing workflows.