GAIRImage → Video

GAIR Releases daVinci-MagiHuman for Video Generation

The new open-source model from the General Artificial Intelligence Research team can create video clips complete with audio from a variety of inputs.

Mar 21, 2026

NotableOther

The General Artificial Intelligence Research (GAIR) team has released daVinci-MagiHuman, a new open-source model designed for multimodal video generation. Released under an OpenRAIL-M license, the project provides a powerful new tool for developers and researchers working on creative AI applications.

Unlike many video models that work from a text prompt alone, daVinci-MagiHuman is positioned as a “multimodal agent” for creating audio-visual content. It can take a combination of text, images, and audio to produce a final video, offering a more flexible and controllable creative process. For example, a user could provide a still image and a text prompt to animate it into a short clip.

Core Capabilities

The model is built to handle a range of generative tasks, allowing users to direct video creation with different types of input. Key functions highlighted in the project's official release include:

Text-to-Video: Generating video from a descriptive text prompt.
Image-to-Video: Animating a static source image based on text instructions.
Audio-Video Generation: Creating video that is influenced by an audio input.
Video Editing: Performing tasks like style transfer on existing video clips.

This release adds another significant entry into the rapidly expanding field of open-source video generation. By providing a model that handles multiple input types, GAIR is empowering the community to explore more complex and nuanced forms of AI-driven media creation, providing an accessible alternative to the large, proprietary systems being developed in private labs.

Sources

GAIR/daVinci-MagiHuman
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

Core Capabilities

The model is built to handle a range of generative tasks, allowing users to direct video creation with different types of input. Key functions highlighted in the project's official release include:

Text-to-Video: Generating video from a descriptive text prompt.

Image-to-Video: Animating a static source image based on text instructions.

Audio-Video Generation: Creating video that is influenced by an audio input.

Video Editing: Performing tasks like style transfer on existing video clips.