Zhipu AI Releases Compact Bilingual Speech Model
The new GLM-ASR-Nano model is designed for efficient automatic speech recognition in both English and Mandarin Chinese.

Zhipu AI, a major contributor to the open-source LLM space with its GLM series, has expanded into a new modality with the release of GLM-ASR-Nano-2512. The new model is purpose-built for automatic speech recognition (ASR), or converting spoken language into written text.
The model's key feature is its compact design, making it a strong candidate for applications that require efficiency and lower computational resources, such as on-device transcription. This approach provides an alternative to larger, cloud-dependent models, offering developers more flexibility for privacy-conscious or offline use cases.
Key Features
- Bilingual: The model is designed to handle both English and Mandarin Chinese, two of the world's most widely spoken languages.
- Compact Size: As a "Nano" model with under one billion parameters, it prioritizes performance on consumer-grade hardware.
- Permissive License: Its release under the MIT license allows for broad adoption, including in commercial products, without significant restrictions.
This release signals Zhipu AI's ambition to build a broader ecosystem of models beyond text generation. By providing a permissively licensed, bilingual ASR tool, the company is offering a valuable building block for developers and competing in a space largely defined by models like OpenAI's Whisper. You can find the model and usage instructions on its Hugging Face repository.
Sources
- Visit
zai-org/GLM-ASR-Nano-2512
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.