Mistral AISpeech → Text

Mistral Enters Speech AI with Voxtral Mini Model

The company, known for its powerful text models, has released its first open-source speech recognition system designed for real-time, multilingual transcription.

Jan 21, 2026

NotableApache 2.0

Mistral AI, a company that has rapidly built a reputation for its powerful open-source text models, has released Voxtral Mini 4B Realtime, its first publicly available model for automatic speech recognition (ASR).

This new 4-billion-parameter system is designed specifically for real-time, multilingual speech-to-text applications. Its focus on low-latency performance makes it a candidate for tasks like live captioning, meeting transcription, and voice-activated assistants where immediate feedback is critical.

A New Modality for Mistral

The release signals a significant expansion for the Paris-based AI lab. While previously focused exclusively on text generation with models like Mistral 7B and Mixtral, the company is now entering the competitive audio AI space. This move positions Voxtral as an open-source alternative to established ASR systems, including OpenAI's popular Whisper model.

By releasing Voxtral Mini under a permissive Apache 2.0 license, Mistral continues its strategy of providing foundational tools for developers. The model is now available for download and experimentation on the Hugging Face Hub, allowing the community to build upon and integrate it into new voice-powered applications.

Sources

mistralai/Voxtral-Mini-4B-Realtime-2602
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026

A New Modality for Mistral