The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

Latestzhifeixie1
zhifeixieSpeech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026
UpdateApache 2.0
zhifeixie · Speech → Text
Mega-ASR
Mega-ASR

A new speech recognition model called Mega-ASR has been released by researcher Zhifei Xie, offering a strong open-source option for English and Chinese transcription. The 1.7 billion-parameter model is a fine-tuned version of Alibaba's recently released Qwen3-ASR-1.7B, demonstrating how community efforts can quickly build upon and specialize foundational models.

This release highlights the collaborative nature of open-source AI. By taking a capable base model and training it further on a curated mix of public and private datasets, the developer was able to enhance its performance for specific use cases. The model's permissive Apache 2.0 license allows it to be freely used and modified, even for commercial applications, encouraging further adoption and innovation.

Fine-Tuning for Robustness

According to performance metrics shared by the developer, Mega-ASR achieves a lower word and character error rate than both its base model and OpenAI's Whisper-large-v3 across several key benchmarks. The improvements are particularly notable on Chinese language datasets like AISHELL-1 and Wenetspeech, suggesting the additional training successfully targeted areas for improvement.

For developers and researchers working with English or Chinese audio, Mega-ASR represents a powerful and accessible tool for automatic speech recognition. The model is available for download and use from its Hugging Face repository, where the author has also provided details on its training process and evaluation results.

Sources

  • zhifeixie/Mega-ASR

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters1.7B
Context window—
LicenseAPACHE-2.0
Downloads0

Modalities

Speech → Text

More in Speech → Text

NVIDIA
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Xiaomi
MiMo-V2.5-ASR
MiMo-V2.5-ASR
Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026
IBM
Granite Speech 4.1 2B
Granite Speech 4.1 2B
IBM/Speech → Text

IBM Releases 2B Granite Model for Multilingual Speech

The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.

Apr 16, 2026