Alibaba's Wan2.2 Adds Control to Open Video
The new 14-billion-parameter model from Alibaba's PAI team offers fine-grained control over video generation using inputs like sketches and depth maps.
Alibaba's Platform for AI (PAI) team has released a new open-source model for video generation, Wan2.2-VACE-Fun-A14B. This 14-billion-parameter model isn't just another text-to-video generator; its primary focus is on providing creators with a high degree of control over the final output.
The key differentiator for this Wan2.2 variant is its ability to condition video generation on more than just a text prompt. Users can provide structural inputs like depth maps, Canny edge outlines, sketches, and human pose skeletons to guide the creative process. This allows for much more precise control over scene composition and character movement than is possible with text alone.
Under the Hood
The model is part of the "Video-Audio-Caption-Editing" (VACE) project and uses a parameter-efficient method called "Fun-tuning" to adapt the base model for these specialized control tasks. According to the team, this approach makes it more efficient to train and adapt the model for specific creative needs.
Released under the permissive Apache 2.0 license, Wan2.2-VACE-Fun-A14B joins a growing field of open models that are making advanced video synthesis more accessible. By focusing on controllability, Alibaba is providing a valuable tool for developers and artists who need to move beyond simple prompts and direct their creative process with greater precision.
Sources
- Visit
alibaba-pai/Wan2.2-VACE-Fun-A14B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Video

JD.com Enters Open-Source AI Video with JoyAI-Echo
The Chinese e-commerce giant has released a new model capable of generating long-form, multi-shot videos with synchronized audio from text prompts.

Baidu Releases NAVA for Text-to-Video with Audio
The new model from the Chinese tech giant uses a Multimodal Diffusion Transformer to generate synchronized audio and video from text or image prompts.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.