The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestMicrosoft1.5B
MicrosoftText → Speech

Microsoft Releases VibeVoice for Long-Form Audio

The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.

Aug 25, 2025
NotableMIT
Microsoft · Text → Speech
VibeVoice-1.5B
VibeVoice-1.5B

Microsoft has released VibeVoice-1.5B, a new open-source model aimed at generating high-quality, long-form speech. At 1.5 billion parameters, it's a notable new entry in the text-to-speech (TTS) landscape, focusing on a particularly challenging area: creating natural-sounding, multi-speaker conversations.

The model is specifically designed to produce audio that mimics the style of podcasts. It supports both English and Chinese, making it versatile for a wide range of applications. Importantly, VibeVoice is released under a permissive MIT license, which allows for broad use in both research and commercial projects without significant restrictions.

Key Capabilities

  • Long-form Generation: Capable of producing extended audio clips beyond typical short sentences.
  • Multi-speaker Support: Can synthesize conversations involving different voices.
  • Bilingual: Supports both English and Chinese text input.
  • Permissive Licensing: Released under the MIT license, encouraging wide adoption.

The release of VibeVoice matters because it provides a strong open-source alternative for creating sophisticated audio content that has often been the domain of proprietary services. Developers and creators can now experiment with generating entire podcast episodes, dynamic audiobooks, or more complex conversational agents. You can find the model and usage instructions on its Hugging Face repository.

Sources

  • microsoft/VibeVoice-1.5B

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters1.5B
Context window—
LicenseMIT
Downloads161.4K

Modalities

Text → Speech

More in Text → Speech

Zyphra
Zonos 2
Zonos 2
Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026
Boson AI
Higgs Audio v3 TTS 4B
Higgs Audio v3 TTS 4B
Boson AI/Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026
OpenMOSS
MOSS-TTS v1.5
MOSS-TTS v1.5
OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026