The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestNVIDIA2.5B
NVIDIASpeech → Text

NVIDIA Fuses LLM and ASR in Canary-Qwen 2.5B Model

The 2.5 billion-parameter speech model combines a FastConformer encoder with a Qwen LLM decoder, a hybrid approach to transcription.

Jun 26, 2025
NotableOther
NVIDIA · Speech → Text
Canary-Qwen 2.5B
Canary-Qwen 2.5B

NVIDIA has released Canary-Qwen 2.5B, a new model for automatic speech recognition (ASR) that takes a novel architectural approach. Instead of a single, end-to-end network, the 2.5 billion-parameter model pairs a specialized audio encoder with a general-purpose large language model for decoding text.

This hybrid design is the model's key feature. It uses a FastConformer encoder, a component optimized for efficiently processing audio signals, to understand the input speech. The resulting representation is then handed off to a decoder based on a Qwen large language model. This allows the system to leverage the powerful text generation and contextual understanding of an LLM to produce more accurate and natural-sounding transcriptions.

The model is designed to be multilingual and handle tasks like punctuation and capitalization automatically, which are common challenges for ASR systems. This approach of using an LLM as a "brain" for a specialized task reflects a broader trend in AI, where generalist models are adapted to enhance specific applications.

Canary-Qwen 2.5B is available on Hugging Face under a custom community license. Its release provides developers with a powerful new tool for speech-to-text applications and a clear example of how different model architectures can be effectively combined.

Sources

  • nvidia/canary-qwen-2.5b

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters2.5B
Context window—
LicenseOTHER
Downloads92K

Modalities

Speech → Text

More in Speech → Text

zhifeixie
Mega-ASR
Mega-ASR
zhifeixie/Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026
NVIDIA
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Xiaomi
MiMo-V2.5-ASR
MiMo-V2.5-ASR
Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026