NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.
NVIDIA has released a new open-source video generation model called SANA-WM. Standing for "Stochastic Attention and Autoregressive World Model," the system is designed to create video from text or image prompts, with a particular focus on giving the user explicit control over the camera's movement.
The model's key innovation is its use of bidirectional diffusion. Unlike many generative models that create a sequence from start to finish, SANA-WM can work in multiple directions. This allows it to perform tasks that are crucial for practical video editing:
- In-filling: Generating missing frames between a known start and end point.
- Out-painting: Extending a video clip forward or backward in time.
- Looping: Creating seamless video loops by ensuring the end connects to the beginning.
This bidirectional capability is combined with precise camera controls. Users can define a specific camera trajectory, allowing them to dictate pans, zooms, and other movements within the generated scene. This moves beyond simple text-prompting into a more directorial role for the creator, enabling more intentional and dynamic visual storytelling.
Published on Hugging Face by NVIDIA's Efficient Large Model group, SANA-WM is available under a permissive Apache 2.0 license. Its release provides researchers and developers with a powerful tool for exploring controllable and editable video generation, pushing open-source capabilities closer to the sophisticated systems being developed in closed labs.
Sources
- Visit
Efficient-Large-Model/SANA-WM_bidirectional
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
Lightricks Releases LoRA for AI Lip-Dubbing
The new 'Identity-Control' adapter fine-tunes the company's LTX-2.3 video model to create realistic lip-syncing for dubbing workflows.