Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

Chinese AI firm Zhipu AI, through its research arm zai, has released SCAIL-2, an open-source model designed for a specific and challenging task: character animation. The new diffusion model can take a single static image of a character and bring it to life as a short video clip, following a user-provided sequence of poses.
The model works by conditioning its video generation on two key inputs: the reference image of the character and a control video representing the desired motion, typically as a skeletal pose estimation. This method gives creators granular control over the final animation, allowing them to precisely direct the character's movements rather than relying on a simple text prompt.
Why It Matters
While many recent open video models focus on general-purpose text-to-video generation, SCAIL-2 provides a specialized tool for animators, game developers, and creative technologists. By focusing on pose-driven control, it opens up new workflows for creating character-centric content with a high degree of consistency and directorial input.
Released under the permissive MIT license, SCAIL-2 allows for broad adoption and commercial use, encouraging developers to integrate it into new applications and build upon the core technology. The model and usage instructions are available on the official zai organization page on Hugging Face.
Sources
- Visit
zai-org/SCAIL-2
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.
Lightricks Releases LoRA for AI Lip-Dubbing
The new 'Identity-Control' adapter fine-tunes the company's LTX-2.3 video model to create realistic lip-syncing for dubbing workflows.

Motif Releases 2B Open-Source Text-to-Video Model
The new Apache 2.0 licensed model uses a diffusion transformer architecture to offer a new open alternative for video generation research.