RaphaelLiuImage → Video

Pusa V1: A New Open Model for Image-to-Video Animation

Based on the Wan2.1 architecture, this new 14B parameter model offers fine-grained control over video generation from still images and text.

Jul 14, 2025

UpdateApache 2.0

A new open-source video generation model named Pusa V1 has been released by developer Raphael Liu. The 14-billion-parameter model is designed primarily for image-to-video tasks, allowing creators to animate a static image using a text prompt, but it also supports direct text-to-video generation.

Pusa V1 builds upon the foundation of another open model, Wan2.1, which itself is a fine-tuned version of the popular AnimateDiff framework. This lineage gives Pusa a strong focus on producing consistent and controllable character animations, a common goal for developers working in this space. The full model weights are available on Hugging Face under a permissive Apache 2.0 license.

Fine-grained video control

What sets Pusa V1 apart is its collection of tools for directing the final video output with more precision than a single prompt allows. These features give artists and developers a higher degree of control over the generated sequence.

Key capabilities include:

Image-to-Video: Animate a source image with a text prompt.
Start & End Frame Control: Define the first and last frames of a video to guide the animation.
Video Extension: Seamlessly extend the duration of an existing video clip.

The model's open availability and focus on controllable animation tools make it a notable new entry for developers and artists experimenting with generative video, particularly for storytelling and character-driven content.