Mistral Enters Speech AI with Voxtral Mini Model
The company, known for its powerful text models, has released its first open-source speech recognition system designed for real-time, multilingual transcription.

Mistral AI, a company that has rapidly built a reputation for its powerful open-source text models, has released Voxtral Mini 4B Realtime, its first publicly available model for automatic speech recognition (ASR).
This new 4-billion-parameter system is designed specifically for real-time, multilingual speech-to-text applications. Its focus on low-latency performance makes it a candidate for tasks like live captioning, meeting transcription, and voice-activated assistants where immediate feedback is critical.
A New Modality for Mistral
The release signals a significant expansion for the Paris-based AI lab. While previously focused exclusively on text generation with models like Mistral 7B and Mixtral, the company is now entering the competitive audio AI space. This move positions Voxtral as an open-source alternative to established ASR systems, including OpenAI's popular Whisper model.
By releasing Voxtral Mini under a permissive Apache 2.0 license, Mistral continues its strategy of providing foundational tools for developers. The model is now available for download and experimentation on the Hugging Face Hub, allowing the community to build upon and integrate it into new voice-powered applications.
Sources
- Visit
mistralai/Voxtral-Mini-4B-Realtime-2602
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.