Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Boson AI has released Higgs Audio v3, a new 4-billion-parameter model for text-to-speech (TTS) synthesis. The model is designed to generate highly expressive and controllable speech across multiple languages, positioning it as a powerful new tool for creators and researchers in the open-source audio space.
Fine-Grained Vocal Control
The primary differentiator for Higgs Audio v3 is its focus on granular control over vocal delivery. The system allows users to influence the output's style, emotion, and prosody in two main ways:
- Text Prompts: Users can describe the desired vocal characteristics directly in a text prompt.
- Audio References: The model can also clone a voice and its style from a reference audio clip.
This dual-control mechanism provides flexibility for a range of applications, from character voice generation to dynamic audiobook narration. The model currently supports eight languages, including English, French, German, Spanish, and Polish.
Higgs Audio v3 enters a competitive field of open TTS models. Its 4B parameter size and emphasis on direct, prompt-based style control offer a compelling alternative for developers and creators who require more than just a simple voice clone. This level of expressiveness is critical for applications where emotional nuance and specific delivery are key.
The model is available for download from the Boson AI repository on Hugging Face. It has been released under a Creative Commons (CC-BY-NC-SA-4.0) license, which allows for broad non-commercial use and adaptation.
Sources
- Visit
bosonai/higgs-audio-v3-tts-4b
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

MisoLabs Debuts MisoTTS, an Open Voice Model
The new text-to-speech system adapts the decoder-only architecture of language models like Llama to generate more natural-sounding speech.