Lightricks Releases LTX-2 Multimodal Video Generator
The new diffusion model from the creative app company can generate short video clips from text, images, audio, and even other videos.
Lightricks, the company behind popular creative apps like Facetune and Videoleap, has released LTX-2, a powerful multimodal model for video synthesis. Unlike many generators that focus on a single input type, LTX-2 is designed for versatility, capable of creating video clips from a wide range of prompts including text descriptions, still images, audio tracks, and existing videos.
The model is built on a latent video diffusion architecture. It uses a suite of specialized encoders to interpret different inputs: a T5-based encoder for text, a CLIP encoder for images, and a CLAP encoder for audio. These components work together within a U-Net backbone to translate multimodal concepts into coherent video sequences.
Open Under a Responsible AI License
Lightricks has made the model weights available on Hugging Face, but its use is governed by a CreativeML Open RAIL-M license. This is a “Responsible AI License” that permits commercial use but includes specific restrictions against certain applications, such as generating misinformation or content for malicious purposes. This licensing strategy is part of a growing trend where developers aim to balance open access with safeguards against misuse.
While LTX-2 represents another step forward for accessible video generation tools, the creators note its limitations. Like many current video models, it can sometimes produce visual artifacts and may struggle with perfect temporal consistency over longer clips. The model is available for download and experimentation on the Hugging Face Hub.
Sources
- Visit
Lightricks/LTX-2
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

NVIDIA Releases Cosmos3 Image-to-Video World Model
The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.
NVIDIA Releases SANA, a Camera-Controllable Video Model
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.