The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestQwen · AlibabaQwen3
Qwen · AlibabaSpeech → Text

Qwen Releases 0.6B Model for Audio-Text Alignment

The new open-source tool, based on the Qwen3 architecture, precisely synchronizes audio recordings with their corresponding text transcripts.

Jan 28, 2026
NotableApache 2.0
Qwen · Alibaba · Speech → Text
Qwen3 ForcedAligner 0.6B
Qwen3 ForcedAligner 0.6B

Alibaba's Qwen team has released a new specialized tool for audio processing, the Qwen3 ForcedAligner 0.6B. This compact 600-million-parameter model is designed for a specific and crucial task in speech AI: aligning existing text with an audio recording.

Unlike standard speech-to-text models that generate text from scratch, a forced aligner takes both an audio file and its transcript as input. It then determines the precise start and end times for each word in the audio, effectively synchronizing the two. This capability is essential for creating accurately timed subtitles, preparing high-quality datasets for training other speech models, and conducting phonetic research.

The model is built on the Qwen3 architecture and is available on the Hugging Face Hub under a permissive Apache 2.0 license, allowing for broad commercial use. Its relatively small size suggests it can be run efficiently, making this alignment technology more accessible to developers and researchers.

The release of Qwen3 ForcedAligner adds another foundational component to the open-source audio ecosystem, providing a key tool for building more sophisticated applications that handle spoken language.

Sources

  • Qwen/Qwen3-ForcedAligner-0.6B

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters600M
Context window—
LicenseAPACHE-2.0
Downloads348.2K

Modalities

Speech → Text

More in Speech → Text

zhifeixie
Mega-ASR
Mega-ASR
zhifeixie/Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026
NVIDIA
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Xiaomi
MiMo-V2.5-ASR
MiMo-V2.5-ASR
Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026