Tencent Releases Wan2.2, a 14B MoE Video Model
The new Apache 2.0-licensed generator uses a Mixture-of-Experts architecture and is available in the popular Diffusers library format for easier integration.

Tencent has released Wan2.2, a powerful new text-to-video model, making it available to the public on the Hugging Face Hub. The model, officially named Wan2.2-T2V-A14B, stands out with its 14-billion-parameter scale and a sophisticated architecture, all provided under a permissive Apache 2.0 license.
An Efficient Architecture
At its core, Wan2.2 employs a Mixture-of-Experts (MoE) design. This approach allows the model to be very large in terms of total parameters while only activating a fraction of them for any given task. The result is more computationally efficient inference compared to a dense model of similar size. Its release in the popular Diffusers library format also simplifies integration for developers and researchers looking to build on the technology.
This release marks another significant contribution to the open-source AI video landscape. By providing a large, advanced model with a permissive license, Tencent is fueling the innovation happening outside of closed, proprietary systems. It provides a new foundation for developers creating applications in media, entertainment, and beyond, adding more competition and diversity to the field of generative video.
Sources
- Visit
Wan-AI/Wan2.2-T2V-A14B-Diffusers
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Video

JD.com Enters Open-Source AI Video with JoyAI-Echo
The Chinese e-commerce giant has released a new model capable of generating long-form, multi-shot videos with synchronized audio from text prompts.

Baidu Releases NAVA for Text-to-Video with Audio
The new model from the Chinese tech giant uses a Multimodal Diffusion Transformer to generate synchronized audio and video from text or image prompts.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.