LightricksImage → Video

Lightricks LTX-2.3 Generates Video and Audio Together

The new model, based on Stable Video Diffusion, can create video and a corresponding soundtrack simultaneously from text, image, or audio prompts.

Mar 4, 2026

NotableOther

Lightricks, the company behind popular creative apps like Facetune and Videoleap, has released the weights for LTX-2.3, a multimodal model for video generation. Unlike many systems that handle visuals and sound separately, LTX-2.3 is designed to generate both video and audio streams from a shared latent space, creating a more cohesive output.

The model is an extension of Stability AI's popular Stable Video Diffusion (SVD) 1.1. Lightricks has adapted the architecture to accept not just text and images as starting points, but audio as well. This allows the model to perform a range of generative tasks from a single, unified framework.

Core Capabilities

The model supports several generation modes:

Text-to-Video: Creating video clips from a written description.
Image-to-Video: Animating a static image based on a prompt.
Audio-to-Video: Generating visuals that correspond to a sound input.
Integrated Audio Synthesis: Producing a fitting soundtrack along with the video.

The model weights and usage instructions are available on the Hugging Face Hub. LTX-2.3 is released under the Lightricks CreativeML Open RAIL-M license, which permits commercial use but includes specific use-case restrictions, placing it firmly in the category of responsibly licensed open models. This release marks another step toward more integrated and streamlined AI tools for content creators.