Pusa V1: A New Open Model for Image-to-Video Animation
Based on the Wan2.1 architecture, this new 14B parameter model offers fine-grained control over video generation from still images and text.

A new open-source video generation model named Pusa V1 has been released by developer Raphael Liu. The 14-billion-parameter model is designed primarily for image-to-video tasks, allowing creators to animate a static image using a text prompt, but it also supports direct text-to-video generation.
Pusa V1 builds upon the foundation of another open model, Wan2.1, which itself is a fine-tuned version of the popular AnimateDiff framework. This lineage gives Pusa a strong focus on producing consistent and controllable character animations, a common goal for developers working in this space. The full model weights are available on Hugging Face under a permissive Apache 2.0 license.
Fine-grained video control
What sets Pusa V1 apart is its collection of tools for directing the final video output with more precision than a single prompt allows. These features give artists and developers a higher degree of control over the generated sequence.
Key capabilities include:
- Image-to-Video: Animate a source image with a text prompt.
- Start & End Frame Control: Define the first and last frames of a video to guide the animation.
- Video Extension: Seamlessly extend the duration of an existing video clip.
The model's open availability and focus on controllable animation tools make it a notable new entry for developers and artists experimenting with generative video, particularly for storytelling and character-driven content.
Sources
- Visit
RaphaelLiu/PusaV1
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.