Baidu's Live-Avatar Animates Photos With Audio
The new 14-billion-parameter model uses audio input to generate realistic talking head videos from a single still image.
Baidu's Quark Vision team has released Live-Avatar, an open-source model that animates a still photograph into a talking head video using a separate audio track. The project aims to create realistic, audio-driven digital avatars from a single source image, a common but challenging task in generative AI.
The 14-billion-parameter model is built upon a foundation model called Wan2.2-S2V-14B, which specializes in still-image-to-video generation. Live-Avatar is fine-tuned for the specific task of synchronizing lip movements and generating natural head motions that correspond to the cadence and content of the provided audio input.
While audio-driven avatar technology has been explored extensively in commercial applications, the release of a powerful open-source model like Live-Avatar under a permissive Apache 2.0 license is significant. It provides researchers and developers with a strong baseline for creating virtual assistants, enhancing accessibility tools, or powering new forms of digital content creation.
The model, code, and usage instructions are now available for developers to explore on the Hugging Face Hub. The repository includes examples demonstrating the model's ability to generate coherent and expressive video from a variety of portrait images.
Sources
- Visit
Quark-Vision/Live-Avatar
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.