GAIR Releases daVinci-MagiHuman for Video Generation
The new open-source model from the General Artificial Intelligence Research team can create video clips complete with audio from a variety of inputs.

The General Artificial Intelligence Research (GAIR) team has released daVinci-MagiHuman, a new open-source model designed for multimodal video generation. Released under an OpenRAIL-M license, the project provides a powerful new tool for developers and researchers working on creative AI applications.
Unlike many video models that work from a text prompt alone, daVinci-MagiHuman is positioned as a “multimodal agent” for creating audio-visual content. It can take a combination of text, images, and audio to produce a final video, offering a more flexible and controllable creative process. For example, a user could provide a still image and a text prompt to animate it into a short clip.
Core Capabilities
The model is built to handle a range of generative tasks, allowing users to direct video creation with different types of input. Key functions highlighted in the project's official release include:
- Text-to-Video: Generating video from a descriptive text prompt.
- Image-to-Video: Animating a static source image based on text instructions.
- Audio-Video Generation: Creating video that is influenced by an audio input.
- Video Editing: Performing tasks like style transfer on existing video clips.
This release adds another significant entry into the rapidly expanding field of open-source video generation. By providing a model that handles multiple input types, GAIR is empowering the community to explore more complex and nuanced forms of AI-driven media creation, providing an accessible alternative to the large, proprietary systems being developed in private labs.
Sources
- Visit
GAIR/daVinci-MagiHuman
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.