Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Chinese technology company Xiaomi has released MiMo-V2.5-ASR, a new model for automatic speech recognition (ASR) available to the open-source community. The model is designed to perform speech-to-text tasks in three languages: Mandarin Chinese, English, and Cantonese.
While Xiaomi has not provided extensive details on the model's architecture or training dataset, its focused multilingual capability makes it a potentially valuable tool for developers building applications for these specific language markets. The inclusion of Cantonese is particularly notable, as it is often less supported than Mandarin in large-scale ASR systems.
Permissive and Practical
The release is significant as it comes from a major global electronics manufacturer, signaling a continued interest from large corporations in contributing to the open AI ecosystem. The model's utility is enhanced by its licensing terms.
Xiaomi has released MiMo-V2.5-ASR under the MIT license, one of the most permissive open-source licenses available. This allows for unrestricted use, modification, and distribution, including for commercial purposes, removing a common barrier to adoption for many businesses and independent developers. The model and its usage instructions are available on its Hugging Face repository.
Sources
- Visit
XiaomiMiMo/MiMo-V2.5-ASR
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

IBM Releases 2B Granite Model for Multilingual Speech
The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.