Mistral AI Releases Voxtral, an Open-Source TTS Model
The French AI leader expands beyond large language models with a new, 4-billion-parameter model for generating multilingual speech.
Mistral AI, known for its influential open-weight language models, has ventured into a new domain with the release of Voxtral 4B TTS. The new 4-billion-parameter model is designed for multilingual text-to-speech (TTS) synthesis and is available under the permissive Apache 2.0 license.
The release marks a significant expansion for the company, moving it beyond its core focus on text generation and into the competitive space of open-source generative audio. While Mistral has built its reputation on models like Mistral 7B and Mixtral, Voxtral is its first major public release dedicated to audio synthesis.
A New Voice in Open Audio
The model's size suggests a focus on generating high-quality, natural-sounding speech across multiple languages. Developers and researchers can access the model and its technical details directly from its official repository on Hugging Face.
By choosing the Apache 2.0 license, Mistral AI is enabling broad commercial and research use. The availability of a powerful, open-source TTS model could accelerate development in areas like customized voice assistants, automated content creation, and accessibility tools, providing a strong alternative to proprietary APIs.
Sources
- Visit
mistralai/Voxtral-4B-TTS-2603
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.