The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestMicrosoftRealtime 0.5B
MicrosoftText → Speech

Microsoft Releases VibeVoice for Real-Time AI Speech

The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.

Dec 4, 2025
NotableOther
Microsoft · Text → Speech
VibeVoice Realtime 0.5B
VibeVoice Realtime 0.5B

Microsoft has released VibeVoice-Realtime-0.5B, a new open-source model focused on generating high-quality speech with minimal delay. As a streaming text-to-speech (TTS) system, it's engineered to begin producing audio almost instantly, making it suitable for interactive applications where responsiveness is critical.

The model addresses a key challenge in generative audio: latency. While many TTS models produce natural-sounding speech, they often require the entire text input before synthesis can begin. VibeVoice's streaming architecture is built for use cases like real-time conversational agents, live content narration, and accessible tools where a natural, uninterrupted flow is essential.

A Compact and Capable Architecture

VibeVoice is a compact model, containing just 500 million parameters. According to its release page on Hugging Face, it is built upon the Qwen2.5-0.5B language model from Alibaba. This approach of fine-tuning a capable, general-purpose foundation model for a specific task like TTS highlights a common and efficient strategy in AI development.

Key features of the model include:

  • Real-time streaming: Enables low-latency audio generation.
  • Long-form speech: Capable of handling extended text inputs without degradation.
  • Efficient size: The 0.5-billion-parameter architecture is suitable for a wide range of hardware.

Developers and researchers can access VibeVoice-Realtime-0.5B on Hugging Face. However, its use is restricted by a custom license that permits research and non-commercial applications only.

Sources

  • microsoft/VibeVoice-Realtime-0.5B

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters500M
Context window—
LicenseOTHER
Downloads567.1K

Modalities

Text → Speech

More in Text → Speech

Zyphra
Zonos 2
Zonos 2
Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026
Boson AI
Higgs Audio v3 TTS 4B
Higgs Audio v3 TTS 4B
Boson AI/Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026
OpenMOSS
MOSS-TTS v1.5
MOSS-TTS v1.5
OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026