NVIDIAImage → Video

NVIDIA Releases SANA, a Camera-Controllable Video Model

The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.

May 18, 2026

NotableApache 2.0

NVIDIA has released a new open-source video generation model called SANA-WM. Standing for "Stochastic Attention and Autoregressive World Model," the system is designed to create video from text or image prompts, with a particular focus on giving the user explicit control over the camera's movement.

The model's key innovation is its use of bidirectional diffusion. Unlike many generative models that create a sequence from start to finish, SANA-WM can work in multiple directions. This allows it to perform tasks that are crucial for practical video editing:

In-filling: Generating missing frames between a known start and end point.
Out-painting: Extending a video clip forward or backward in time.
Looping: Creating seamless video loops by ensuring the end connects to the beginning.

This bidirectional capability is combined with precise camera controls. Users can define a specific camera trajectory, allowing them to dictate pans, zooms, and other movements within the generated scene. This moves beyond simple text-prompting into a more directorial role for the creator, enabling more intentional and dynamic visual storytelling.

Published on Hugging Face by NVIDIA's Efficient Large Model group, SANA-WM is available under a permissive Apache 2.0 license. Its release provides researchers and developers with a powerful tool for exploring controllable and editable video generation, pushing open-source capabilities closer to the sophisticated systems being developed in closed labs.

Sources

Efficient-Large-Model/SANA-WM_bidirectional
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026