Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.
AI research company Zyphra has released Zonos 2, an open-weight model for text-to-speech (TTS) synthesis. The model is designed to generate human-like audio from text inputs, providing a foundational tool for applications requiring voice output.
The most significant aspect of the release is its license. Zonos 2 is available under the Apache 2.0 license, a permissive open-source license that allows for commercial use, modification, and distribution. This stands in contrast to the many high-quality TTS systems that are only accessible through proprietary, paid APIs, giving developers a new option for building and owning their voice generation stack.
While Zyphra has not yet published detailed technical specifications or performance benchmarks, the model and its weights are available for download on Hugging Face. This allows developers and researchers to immediately begin experimenting with the model and integrating it into their projects.
Zonos 2 represents a welcome expansion of open-source AI into modalities beyond text generation. As developers seek to build more complex, multi-modal applications, the availability of high-quality, permissively licensed components for audio, vision, and other senses will become increasingly crucial.
Sources
- Visit
Zyphra/ZONOS2
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Supertone Releases On-Device Multilingual TTS Model
The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

Resemble AI Releases Dramabox Voice Cloning TTS Model
The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.