zhifeixieSpeech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026

UpdateApache 2.0

A new speech recognition model called Mega-ASR has been released by researcher Zhifei Xie, offering a strong open-source option for English and Chinese transcription. The 1.7 billion-parameter model is a fine-tuned version of Alibaba's recently released Qwen3-ASR-1.7B, demonstrating how community efforts can quickly build upon and specialize foundational models.

This release highlights the collaborative nature of open-source AI. By taking a capable base model and training it further on a curated mix of public and private datasets, the developer was able to enhance its performance for specific use cases. The model's permissive Apache 2.0 license allows it to be freely used and modified, even for commercial applications, encouraging further adoption and innovation.

Fine-Tuning for Robustness

According to performance metrics shared by the developer, Mega-ASR achieves a lower word and character error rate than both its base model and OpenAI's Whisper-large-v3 across several key benchmarks. The improvements are particularly notable on Chinese language datasets like AISHELL-1 and Wenetspeech, suggesting the additional training successfully targeted areas for improvement.

For developers and researchers working with English or Chinese audio, Mega-ASR represents a powerful and accessible tool for automatic speech recognition. The model is available for download and use from its Hugging Face repository, where the author has also provided details on its training process and evaluation results.

Sources

zhifeixie/Mega-ASR
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026

Fine-Tuning for Robustness