XiaomiSpeech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026

NotableMIT

Chinese technology company Xiaomi has released MiMo-V2.5-ASR, a new model for automatic speech recognition (ASR) available to the open-source community. The model is designed to perform speech-to-text tasks in three languages: Mandarin Chinese, English, and Cantonese.

While Xiaomi has not provided extensive details on the model's architecture or training dataset, its focused multilingual capability makes it a potentially valuable tool for developers building applications for these specific language markets. The inclusion of Cantonese is particularly notable, as it is often less supported than Mandarin in large-scale ASR systems.

Permissive and Practical

The release is significant as it comes from a major global electronics manufacturer, signaling a continued interest from large corporations in contributing to the open AI ecosystem. The model's utility is enhanced by its licensing terms.

Xiaomi has released MiMo-V2.5-ASR under the MIT license, one of the most permissive open-source licenses available. This allows for unrestricted use, modification, and distribution, including for commercial purposes, removing a common barrier to adoption for many businesses and independent developers. The model and its usage instructions are available on its Hugging Face repository.

Sources

XiaomiMiMo/MiMo-V2.5-ASR
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026

Permissive and Practical