Qwen · AlibabaImage → Video

Alibaba Releases 14B Model for Audio-Driven Video

The new Wan2.2-S2V model takes a still image and a speech track to generate a realistic talking-head animation, available under a permissive license.

Aug 25, 2025

NotableApache 2.0

The team behind Alibaba's Qwen models has released Wan2.2-S2V-14B, a new open-source model designed for a specific creative task: generating talking-head videos from a single image and an audio file. With 14 billion parameters, the model animates a person's face to match a given speech track, effectively creating a lifelike digital puppet.

The 'S2V' in the model's name stands for Speech-to-Video, highlighting its specialized function. Unlike general-purpose text-to-video systems, Wan2.2-S2V focuses exclusively on the challenge of syncing facial movements and lip-sync to an audio source. It analyzes the audio's phonetic components and timing to produce a corresponding, natural-looking animation on the provided static image.

Why it matters

This release provides developers and creators with a powerful tool for applications like creating virtual presenters, dubbing video content into new languages, or generating character animations for digital media. The model's permissive Apache 2.0 license is particularly notable, as it allows for broad commercial use—a key distinction from many research-oriented releases in the space.

Wan2.2-S2V-14B represents a growing trend of specialized, open-source AI tools that excel at one task rather than attempting to be all-purpose generators. It builds on the Qwen team's portfolio of powerful open models and is available for download and experimentation on Hugging Face.

Sources

Wan-AI/Wan2.2-S2V-14B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

Why it matters