Ovi Syncs Audio and Video in New Open-Source Model
Built on the Wan2.2 architecture, this new 5-billion-parameter model generates short video clips from a single image and simultaneously creates synchronized audio.
A new open-source model named Ovi has been released, offering a significant step forward in generative media by combining image-to-video creation with synchronized audio. Released by community developer Chetwin Low, the 5-billion-parameter model allows users to generate short video clips from a static image, with an accompanying audio track that matches the visual content. The entire project is available under a permissive Apache 2.0 license.
The model is built upon the foundation of Wan2.2-TI2V-5B, a powerful text-and-image-to-video architecture. Ovi refines this base by specializing in the image-to-video task and integrating a novel audio generation component. The result is a unified system capable of producing more immersive and complete video clips than many existing open-source alternatives. You can explore the model and its components on its Hugging Face repository.
How It Works
Ovi's functionality is split into two core components that work in tandem:
- Ovi-I2V: The primary image-to-video module responsible for animating the source image into a coherent video sequence.
- Ovi-AAV: The audio generation module, which creates sound effects or ambient noise designed to align with the generated visuals.
This two-part structure allows the model to first generate the visual motion and then produce an appropriate audio layer, ensuring the final output feels cohesive. The integration of audio directly into the generation pipeline is a key differentiator for the model.
For the open-source AI community, Ovi represents an important move toward more holistic creative tools. While separate models for video and audio generation exist, integrated and synchronized systems have largely been the domain of closed, proprietary platforms. By providing a capable, open-licensed model that handles both, Ovi empowers developers and creators to experiment with richer, multi-sensory AI-generated content.
Sources
- Visit
chetwinlow1/Ovi
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.