Qwen3 Family Expands into Speech Recognition
Alibaba's Qwen team has released a new 1.7-billion-parameter model designed specifically for automatic speech recognition.
The Qwen team at Alibaba has released Qwen3-ASR-1.7B, extending its new generation of models into the domain of audio processing. This 1.7-billion-parameter model is designed for automatic speech recognition (ASR), also known as speech-to-text, and is now available on the Hugging Face Hub.
Unlike general-purpose language models, Qwen3-ASR is a specialized tool focused on a single task: accurately transcribing spoken language into written text. This makes it a foundational component for a wide range of applications, from creating meeting transcripts and video subtitles to enabling voice-activated user interfaces and accessibility tools.
The release provides another strong open-source option in a field largely defined by models like OpenAI's Whisper. With its Apache 2.0 license, Qwen3-ASR-1.7B offers a permissively licensed alternative for developers and businesses to build upon without restrictive terms. Its relatively moderate size suggests a balance between performance and computational efficiency, making it potentially suitable for a variety of hardware environments.
Sources
- Visit
Qwen/Qwen3-ASR-1.7B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.