EPFL VITAImage → Video

EPFL Releases SVI for Streaming Image-to-Video

The new open-source model from Swiss researchers uses a novel chunking method to generate indefinitely long videos from a single still image.

Oct 8, 2025

UpdateMIT

Researchers at Switzerland's EPFL Visual Intelligence for Transportation (VITA) lab have released an open-source image-to-video model called SVI, which stands for Streaming Video from Images. The model tackles one of the key limitations in current AI video generation: creating clips that last more than a few seconds while maintaining temporal consistency.

While many models generate short, fixed-length videos, SVI is designed to produce videos of arbitrary length from a single starting image. It accomplishes this through an autoregressive process that generates the video in overlapping chunks. This "streaming" technique allows the model to extend the animation frame by frame, ensuring smoother and more coherent transitions over longer durations.

SVI is built upon a strong foundation, having been fine-tuned from Stability AI's popular Stable Video Diffusion (SVD) model. The team trained their version on the large-scale WebVid-10M dataset to develop its long-form generation capabilities.

The model and its code are available on the Hugging Face Hub under a permissive MIT license. While still a research artifact, SVI's novel approach offers a promising direction for developers and academics working to overcome the short-form constraints of today's open-source video generation tools.