Zhipu AISpeech → Text

Zhipu AI Releases Compact Bilingual Speech Model

The new GLM-ASR-Nano model is designed for efficient automatic speech recognition in both English and Mandarin Chinese.

Dec 9, 2025

NotableMIT

Zhipu AI, a major contributor to the open-source LLM space with its GLM series, has expanded into a new modality with the release of GLM-ASR-Nano-2512. The new model is purpose-built for automatic speech recognition (ASR), or converting spoken language into written text.

The model's key feature is its compact design, making it a strong candidate for applications that require efficiency and lower computational resources, such as on-device transcription. This approach provides an alternative to larger, cloud-dependent models, offering developers more flexibility for privacy-conscious or offline use cases.

Key Features

Bilingual: The model is designed to handle both English and Mandarin Chinese, two of the world's most widely spoken languages.
Compact Size: As a "Nano" model with under one billion parameters, it prioritizes performance on consumer-grade hardware.
Permissive License: Its release under the MIT license allows for broad adoption, including in commercial products, without significant restrictions.

This release signals Zhipu AI's ambition to build a broader ecosystem of models beyond text generation. By providing a permissively licensed, bilingual ASR tool, the company is offering a valuable building block for developers and competing in a space largely defined by models like OpenAI's Whisper. You can find the model and usage instructions on its Hugging Face repository.

Sources

zai-org/GLM-ASR-Nano-2512
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026

Key Features

Bilingual: The model is designed to handle both English and Mandarin Chinese, two of the world's most widely spoken languages.

Compact Size: As a "Nano" model with under one billion parameters, it prioritizes performance on consumer-grade hardware.

Permissive License: Its release under the MIT license allows for broad adoption, including in commercial products, without significant restrictions.