Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

A new speech recognition model called Mega-ASR has been released by researcher Zhifei Xie, offering a strong open-source option for English and Chinese transcription. The 1.7 billion-parameter model is a fine-tuned version of Alibaba's recently released Qwen3-ASR-1.7B, demonstrating how community efforts can quickly build upon and specialize foundational models.
This release highlights the collaborative nature of open-source AI. By taking a capable base model and training it further on a curated mix of public and private datasets, the developer was able to enhance its performance for specific use cases. The model's permissive Apache 2.0 license allows it to be freely used and modified, even for commercial applications, encouraging further adoption and innovation.
Fine-Tuning for Robustness
According to performance metrics shared by the developer, Mega-ASR achieves a lower word and character error rate than both its base model and OpenAI's Whisper-large-v3 across several key benchmarks. The improvements are particularly notable on Chinese language datasets like AISHELL-1 and Wenetspeech, suggesting the additional training successfully targeted areas for improvement.
For developers and researchers working with English or Chinese audio, Mega-ASR represents a powerful and accessible tool for automatic speech recognition. The model is available for download and use from its Hugging Face repository, where the author has also provided details on its training process and evaluation results.
Sources
- Visit
zhifeixie/Mega-ASR
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

IBM Releases 2B Granite Model for Multilingual Speech
The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.