Lightricks LTX-2.3 Generates Video and Audio Together
The new model, based on Stable Video Diffusion, can create video and a corresponding soundtrack simultaneously from text, image, or audio prompts.

Lightricks, the company behind popular creative apps like Facetune and Videoleap, has released the weights for LTX-2.3, a multimodal model for video generation. Unlike many systems that handle visuals and sound separately, LTX-2.3 is designed to generate both video and audio streams from a shared latent space, creating a more cohesive output.
The model is an extension of Stability AI's popular Stable Video Diffusion (SVD) 1.1. Lightricks has adapted the architecture to accept not just text and images as starting points, but audio as well. This allows the model to perform a range of generative tasks from a single, unified framework.
Core Capabilities
The model supports several generation modes:
- Text-to-Video: Creating video clips from a written description.
- Image-to-Video: Animating a static image based on a prompt.
- Audio-to-Video: Generating visuals that correspond to a sound input.
- Integrated Audio Synthesis: Producing a fitting soundtrack along with the video.
The model weights and usage instructions are available on the Hugging Face Hub. LTX-2.3 is released under the Lightricks CreativeML Open RAIL-M license, which permits commercial use but includes specific use-case restrictions, placing it firmly in the category of responsibly licensed open models. This release marks another step toward more integrated and streamlined AI tools for content creators.
Sources
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.