The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestMicrosoft1.0
MicrosoftSpeech → Text

Microsoft Releases VibeVoice for Speech Transcription

The new open-source automatic speech recognition model handles multilingual transcription and speaker identification out of the box.

Jan 21, 2026
NotableOther
Microsoft · Speech → Text
VibeVoice ASR
VibeVoice ASR

Microsoft has released VibeVoice-ASR, a new foundational model for automatic speech recognition. The system, now available on Hugging Face, is designed to convert spoken audio into written text across multiple languages.

Beyond simple transcription, VibeVoice's key capability is integrated speaker diarization—the ability to identify and label who is speaking and when. This feature is crucial for accurately transcribing conversations with multiple participants, such as meetings, interviews, or panel discussions, without requiring a separate post-processing step.

Why It Matters

The release adds a notable new entry into the competitive open-source audio space, which includes popular models like OpenAI's Whisper. While Microsoft has not yet published detailed performance benchmarks, VibeVoice’s built-in diarization offers a more streamlined solution for developers who would otherwise need to combine separate models for transcription and speaker identification.

Prospective users should take note of the licensing. According to the official model card, VibeVoice-ASR is being released for research purposes only. This will limit its immediate use in commercial products but provides a valuable new tool for the academic community exploring advanced speech processing systems.

Sources

  • microsoft/VibeVoice-ASR

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseOTHER
Downloads541.5K

Modalities

Speech → Text

More in Speech → Text

zhifeixie
Mega-ASR
Mega-ASR
zhifeixie/Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026
NVIDIA
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Xiaomi
MiMo-V2.5-ASR
MiMo-V2.5-ASR
Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026