The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestBaidu1
BaiduText → Video

Baidu Releases NAVA for Text-to-Video with Audio

The new model from the Chinese tech giant uses a Multimodal Diffusion Transformer to generate synchronized audio and video from text or image prompts.

May 29, 2026
NotableOther
Baidu · Text → Video
NAVA
NAVA

Baidu has released the weights for NAVA, a new generative model capable of producing video complete with synchronized audio from a variety of inputs. NAVA, which stands for Native Audio-Video Animation, can take either a text prompt or a combination of text and an image to generate short video clips. The model and examples are available on its Hugging Face repository.

Under the hood, NAVA employs a sophisticated architecture known as a Multimodal Diffusion Transformer (MMDiT). This design allows the model to process and integrate different data types—like text and image features—within the same transformer blocks, creating a more cohesive understanding of the prompt. The model is built upon Baidu's own Wan2.2 video foundation model, extending its capabilities into multimodal generation.

A More Efficient Method

Instead of traditional diffusion methods, NAVA is trained using a flow-matching technique. This is a more recent approach to training generative models that can lead to more efficient training and faster inference times, as it learns the direct path from noise to a final, coherent output. This choice of technique points to a growing trend toward more computationally efficient generative architectures.

The release of NAVA adds another significant open-weights model to the competitive text-to-video landscape. Its ability to generate audio natively alongside video is a key differentiator, as audio is often a separate, post-processing step for other models. While the model is publicly available, it uses a custom license, so developers and researchers should review the terms before incorporating it into their work.

Sources

  • baidu/NAVA

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseOTHER
Downloads477

Modalities

Text → Video

More in Text → Video

JD
JoyAI-Echo
JoyAI-Echo
JD/Text → Video

JD.com Enters Open-Source AI Video with JoyAI-Echo

The Chinese e-commerce giant has released a new model capable of generating long-form, multi-shot videos with synchronized audio from text prompts.

Jun 2, 2026
NVIDIA
SANA-WM Bidirectional
SANA-WM Bidirectional
NVIDIA/Image → Video

NVIDIA Releases SANA, a Camera-Controllable Video Model

The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.

May 18, 2026
ByteDance
Lance
Lance
ByteDance/Any-to-Any

ByteDance Releases Lance, a Unified Generative AI Model

The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.

May 15, 2026