TencentImage → Video

Tencent Releases HY-OmniWeaving for Multi-Image Video

Built on their HunyuanVideo-1.5 architecture, the new model synthesizes video by combining multiple static images and text prompts into a cohesive narrative.

Mar 31, 2026

NotableOther

Tencent has released HY-OmniWeaving, an open-source model designed to generate video from a combination of multiple images and text prompts. Developed by the Tencent Hunyuan team, the model represents a step beyond single-image animation, focusing instead on creating dynamic video sequences from several static sources.

Unlike many image-to-video models that animate a single input, HY-OmniWeaving is engineered to "weave" together different images into a coherent story. According to the release on Hugging Face, the model is based on the team's existing HunyuanVideo-1.5 model. Users can provide a series of images and use text to guide the motion and transitions between them, effectively directing a short, multi-shot scene.

From Static to Sequence

The model's core capability is its ability to interpret spatial and narrative relationships between distinct images. This allows it to create more complex and engaging video content than simple motion effects. Key features include:

Multi-image input: The model can synthesize a single video from several different source images.
Text-guided narrative: Text prompts control the action and flow of the generated video sequence.
Temporal consistency: It aims to maintain a consistent look and feel as it transitions between elements from different source images.

This approach opens up new possibilities for generative video, moving from simple animation to programmatic storytelling. Potential applications include creating short narratives from storyboards, generating dynamic product slideshows, or animating comic panels. The model is available now for developers and researchers to explore.

Sources

tencent/HY-OmniWeaving
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

From Static to Sequence

Multi-image input: The model can synthesize a single video from several different source images.

Text-guided narrative: Text prompts control the action and flow of the generated video sequence.

Temporal consistency: It aims to maintain a consistent look and feel as it transitions between elements from different source images.