chetwinlow1Image → Video

Ovi Syncs Audio and Video in New Open-Source Model

Built on the Wan2.2 architecture, this new 5-billion-parameter model generates short video clips from a single image and simultaneously creates synchronized audio.

Sep 30, 2025

NotableApache 2.0

A new open-source model named Ovi has been released, offering a significant step forward in generative media by combining image-to-video creation with synchronized audio. Released by community developer Chetwin Low, the 5-billion-parameter model allows users to generate short video clips from a static image, with an accompanying audio track that matches the visual content. The entire project is available under a permissive Apache 2.0 license.

The model is built upon the foundation of Wan2.2-TI2V-5B, a powerful text-and-image-to-video architecture. Ovi refines this base by specializing in the image-to-video task and integrating a novel audio generation component. The result is a unified system capable of producing more immersive and complete video clips than many existing open-source alternatives. You can explore the model and its components on its Hugging Face repository.

How It Works

Ovi's functionality is split into two core components that work in tandem:

Ovi-I2V: The primary image-to-video module responsible for animating the source image into a coherent video sequence.
Ovi-AAV: The audio generation module, which creates sound effects or ambient noise designed to align with the generated visuals.

This two-part structure allows the model to first generate the visual motion and then produce an appropriate audio layer, ensuring the final output feels cohesive. The integration of audio directly into the generation pipeline is a key differentiator for the model.

For the open-source AI community, Ovi represents an important move toward more holistic creative tools. While separate models for video and audio generation exist, integrated and synchronized systems have largely been the domain of closed, proprietary platforms. By providing a capable, open-licensed model that handles both, Ovi empowers developers and creators to experiment with richer, multi-sensory AI-generated content.

Sources

chetwinlow1/Ovi
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

How It Works

Ovi's functionality is split into two core components that work in tandem:

Ovi-I2V: The primary image-to-video module responsible for animating the source image into a coherent video sequence.

Ovi-AAV: The audio generation module, which creates sound effects or ambient noise designed to align with the generated visuals.