The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestMicrosoftLarge
MicrosoftText → Speech

Microsoft Releases VibeVoice, a Podcast-Ready TTS Model

The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.

Sep 4, 2025
NotableMIT
Microsoft · Text → Speech
VibeVoice Large
VibeVoice Large

Microsoft has introduced a new open-source model for text-to-speech synthesis, VibeVoice Large, designed specifically for creating realistic, long-form audio content. Released under the permissive MIT license, the model aims to tackle one of the more challenging frontiers in speech generation: natural, multi-speaker conversations.

Unlike many TTS models optimized for short, single-speaker responses, VibeVoice is built to generate audio that mimics the dynamic flow of a podcast. According to the release materials on Hugging Face, it can handle extended passages of text and differentiate between multiple speakers within the same audio track, supporting both English and Mandarin Chinese.

Why It Matters

The release of VibeVoice addresses a key gap in the open-source AI ecosystem. Creating high-quality, long-form spoken content, especially with multiple voices, has often required complex, proprietary systems or extensive manual editing. By providing a specialized tool for this purpose, Microsoft is enabling developers and creators to build more sophisticated applications, from automated podcast production and audiobook narration to more dynamic virtual assistants.

The model's focus on conversational audio represents a move toward more naturalistic human-computer interaction. As AI becomes more integrated into daily life, the ability to generate speech that is not just clear but also contextually appropriate and engaging is increasingly important. VibeVoice Large is available for download and experimentation now.

Sources

  • aoi-ot/VibeVoice-Large

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseMIT
Downloads2.6K

Modalities

Text → Speech

More in Text → Speech

Zyphra
Zonos 2
Zonos 2
Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026
Boson AI
Higgs Audio v3 TTS 4B
Higgs Audio v3 TTS 4B
Boson AI/Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026
OpenMOSS
MOSS-TTS v1.5
MOSS-TTS v1.5
OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026