Resemble AIText → Speech

Resemble AI Releases Chatterbox Turbo for Open TTS

The new text-to-speech model focuses on performance and offers voice cloning capabilities for English under a permissive MIT license.

Dec 2, 2025

NotableMIT

Resemble AI, a company specializing in synthetic voice technology, has released a new open-source model named Chatterbox Turbo. The model is designed for high-performance text-to-speech (TTS) generation in English, targeting developers who need fast and efficient voice output in their applications.

Beyond standard speech synthesis, Chatterbox Turbo includes voice cloning capabilities, allowing users to create speech in a specific target voice from a short audio sample. The entire project is available on Hugging Face under the permissive MIT license, encouraging wide use and modification in both academic and commercial projects.

This release adds another strong contender to the rapidly growing field of open-source speech generation. While proprietary APIs have dominated the high-quality TTS space, permissively licensed models like Chatterbox Turbo provide a crucial, self-hostable alternative for developers. This move from a commercial provider signals a broader trend of companies contributing foundational models back to the community.

Developers interested in experimenting with the model can find the necessary code and instructions on the official Hugging Face repository.

Sources

ResembleAI/chatterbox-turbo
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026