Mistral AIText → Speech

Mistral AI Releases Voxtral, an Open-Source TTS Model

The French AI leader expands beyond large language models with a new, 4-billion-parameter model for generating multilingual speech.

Nov 17, 2025

NotableApache 2.0

Mistral AI, known for its influential open-weight language models, has ventured into a new domain with the release of Voxtral 4B TTS. The new 4-billion-parameter model is designed for multilingual text-to-speech (TTS) synthesis and is available under the permissive Apache 2.0 license.

The release marks a significant expansion for the company, moving it beyond its core focus on text generation and into the competitive space of open-source generative audio. While Mistral has built its reputation on models like Mistral 7B and Mixtral, Voxtral is its first major public release dedicated to audio synthesis.

A New Voice in Open Audio

The model's size suggests a focus on generating high-quality, natural-sounding speech across multiple languages. Developers and researchers can access the model and its technical details directly from its official repository on Hugging Face.

By choosing the Apache 2.0 license, Mistral AI is enabling broad commercial and research use. The availability of a powerful, open-source TTS model could accelerate development in areas like customized voice assistants, automated content creation, and accessibility tools, providing a strong alternative to proprietary APIs.

Sources

mistralai/Voxtral-4B-TTS-2603
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

A New Voice in Open Audio