Boson AI Releases Higgs Audio v2 for Expressive TTS
The new 3-billion-parameter model focuses on generating expressive, multilingual speech and is fully open for commercial use under an Apache 2.0 license.

Boson AI has introduced Higgs Audio v2, a new open-source model for text-to-speech and audio generation. With three billion parameters, the model is designed to produce expressive and natural-sounding voices across multiple languages, adding a notable new entry to the competitive audio synthesis landscape.
The release is significant not just for its scale but also for its accessibility. Higgs Audio v2 is available under a permissive Apache 2.0 license, clearing the way for both research and commercial applications. This makes it a compelling alternative to proprietary APIs, offering developers a powerful foundation for building custom audio-centric features.
A Focus on Expressive Synthesis
According to Boson AI, the model specializes in "expressive voice synthesis," aiming for a higher degree of nuance and emotion in its output compared to more monotonic TTS systems. This capability is crucial for applications requiring more natural human-like speech, such as:
- Dynamic character voices in gaming and entertainment
- Engaging narration for audiobooks and podcasts
- More sophisticated and personable virtual assistants
The Higgs Audio v2 base model is now available on Hugging Face, allowing the community to begin experimenting with its capabilities and fine-tuning it for specific use cases.
Sources
- Visit
bosonai/higgs-audio-v2-generation-3B-base
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.