Hume AI Releases 3B Multilingual Text-to-Speech Model
The new model, Tada-3B-ML, is designed for fine-grained control over vocal expression across more than 10 languages.

Hume AI has introduced Tada-3B-ML, a new 3-billion-parameter model for text-to-speech (TTS) synthesis. Released on Hugging Face, the model is designed to generate natural-sounding speech with a high degree of expressive control, a key challenge in creating human-like voice interfaces.
A central feature of Tada-3B-ML is its multilingual capability. The model supports a broad set of languages, enabling developers to create voice applications for a global audience. Supported languages include:
- English
- Spanish
- French
- German
- Mandarin Chinese
- Japanese
- Korean
- Hindi
- Portuguese
- Italian
This release contributes to the growing field of expressive and multilingual speech generation. By aiming to capture the subtle prosody and intonation of human speech, models like Tada-3B-ML allow for more nuanced and emotionally resonant applications in areas like voice assistants, audiobooks, and accessibility tools.
An Important Note on Licensing
While the model weights are publicly available, they are governed by a custom license from Hume AI, not a permissive open-source license like Apache 2.0 or MIT. The terms focus on research and non-commercial use and include specific restrictions. Potential users should review the license carefully before integrating Tada-3B-ML into their work.
Sources
- Visit
HumeAI/tada-3b-ml
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.