The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestNVIDIAv2.1
NVIDIASpeech → Text

NVIDIA Releases Real-Time Speaker Diarization Model

The new Sortformer-based model is designed for streaming audio, identifying up to four distinct speakers in real time.

Oct 22, 2025
UpdateOther
NVIDIA · Speech → Text
Streaming Sortformer Diarization 4spk v2.1
Streaming Sortformer Diarization 4spk v2.1

NVIDIA has released a new model aimed at a core challenge in audio processing: figuring out "who spoke when" in real time. The new system, Streaming Sortformer Diarization 4spk v2.1, is designed for speaker diarization on continuous audio streams, a key component for building sophisticated conversational AI.

Unlike traditional diarization systems that process an entire audio file after it has been recorded, this model's "streaming" capability allows it to work on live audio. This is essential for applications like real-time meeting transcription, automated call center analysis, and live captioning, where identifying the current speaker without delay is critical.

Technical Details

The model uses a Transformer-based architecture called Sortformer and is optimized for common conversational scenarios. Its key features include:

  • Speaker Capacity: Supports up to four distinct speakers.
  • Real-time Processing: Designed for low-latency, continuous input.
  • Toolkit Integration: Intended for use with the NVIDIA NeMo toolkit for conversational AI.

The model is available for download on the Hugging Face Hub. It is released under a custom NVIDIA AI Foundation Models EULA, not a traditional open-source license, which may limit its use in some commercial applications.

Sources

  • nvidia/diar_streaming_sortformer_4spk-v2.1

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseOTHER
Downloads28.4K

Modalities

Speech → Text

More in Speech → Text

zhifeixie
Mega-ASR
Mega-ASR
zhifeixie/Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026
NVIDIA
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Xiaomi
MiMo-V2.5-ASR
MiMo-V2.5-ASR
Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026