Resemble AIText → Speech

Resemble AI Releases Dramabox Voice Cloning TTS Model

The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.

Apr 17, 2026

NotableOther

Resemble AI has publicly released Dramabox, a new text-to-speech (TTS) model designed for generating expressive and high-quality audio. The model's standout feature is its ability to perform one-shot voice cloning, replicating a speaker's voice from just a single short audio clip.

Under the hood, Dramabox employs a diffusion-transformer architecture. According to the company, this approach is built upon their LTX-2 flow-matching audio technology, which enables fine-grained control over the generated speech. This allows the model to produce not just clear narration but also audio with emotional nuance and expressiveness, a key challenge in speech synthesis.

Availability and License

Developers and researchers can access the model weights and inference code on the Hugging Face Hub. It's important to note that Dramabox is released under a custom Community License. This license permits non-commercial use and research, but requires a separate commercial license for any business applications.

The release of Dramabox provides the open-weights community with a powerful tool for creative and research-oriented audio projects. Its combination of a modern architecture and effective voice cloning makes it a significant new entry in the landscape of publicly available TTS models, offering a high-quality foundation for non-commercial applications.

Sources

ResembleAI/Dramabox
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Availability and License