ByteDanceImage → Video

ByteDance Releases HuMo for Human Video Generation

The new open-source model specializes in creating realistic videos of people, separating appearance from motion for greater control.

Sep 10, 2025

NotableApache 2.0

ByteDance Research has released HuMo, an open-source model focused on a notoriously difficult task in AI: generating realistic videos of humans. The model can create short video clips from either a text description or a reference image, marking a new entry in the competitive field of AI video synthesis. The weights and code are available on Hugging Face under a permissive Apache 2.0 license.

A Focus on Motion and Anatomy

Unlike general-purpose video models, HuMo is specifically designed to understand and render the human form in motion. According to the project's documentation, it uses a diffusion-based architecture that separates a subject's appearance from their movement. A "human prior encoder" helps maintain anatomical consistency, while a "motion-guidance module" allows for more precise control over the action in the generated clip.

This specialized approach enables several key capabilities:

Text-to-Video: Generating a video of a person performing an action described in a prompt.
Image-to-Video: Animating a person from a single still photograph.
Motion Control: Guiding the generation process with specific motion sequences for more directed outputs.

The release is significant because generating plausible human movement without uncanny or distorted results remains a major hurdle for AI video. By focusing on this specific domain and releasing the model openly, ByteDance provides researchers and creators with a powerful new tool. The Apache 2.0 license further encourages experimentation and potential integration into commercial applications, from creative software to virtual character animation.

Sources

bytedance-research/HuMo
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

A Focus on Motion and Anatomy

This specialized approach enables several key capabilities:

Text-to-Video: Generating a video of a person performing an action described in a prompt.

Image-to-Video: Animating a person from a single still photograph.

Motion Control: Guiding the generation process with specific motion sequences for more directed outputs.