BosonaiText → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026

NotableOther

Boson AI has released Higgs Audio v3, a new 4-billion-parameter model for text-to-speech (TTS) synthesis. The model is designed to generate highly expressive and controllable speech across multiple languages, positioning it as a powerful new tool for creators and researchers in the open-source audio space.

Fine-Grained Vocal Control

The primary differentiator for Higgs Audio v3 is its focus on granular control over vocal delivery. The system allows users to influence the output's style, emotion, and prosody in two main ways:

Text Prompts: Users can describe the desired vocal characteristics directly in a text prompt.
Audio References: The model can also clone a voice and its style from a reference audio clip.

This dual-control mechanism provides flexibility for a range of applications, from character voice generation to dynamic audiobook narration. The model currently supports eight languages, including English, French, German, Spanish, and Polish.

Higgs Audio v3 enters a competitive field of open TTS models. Its 4B parameter size and emphasis on direct, prompt-based style control offer a compelling alternative for developers and creators who require more than just a simple voice clone. This level of expressiveness is critical for applications where emotional nuance and specific delivery are key.

The model is available for download from the Boson AI repository on Hugging Face. It has been released under a Creative Commons (CC-BY-NC-SA-4.0) license, which allows for broad non-commercial use and adaptation.

Sources

bosonai/higgs-audio-v3-tts-4b
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Fine-Grained Vocal Control

The primary differentiator for Higgs Audio v3 is its focus on granular control over vocal delivery. The system allows users to influence the output's style, emotion, and prosody in two main ways:

Text Prompts: Users can describe the desired vocal characteristics directly in a text prompt.

Audio References: The model can also clone a voice and its style from a reference audio clip.