Nari LabsText → Speech

Nari Labs Releases Dia2-2B, an Open Voice Cloning Model

The 2-billion-parameter text-to-speech model can clone voices from a short audio sample and is available under an Apache 2.0 license.

Nov 15, 2025

NotableApache 2.0

Nari Labs has introduced Dia2-2B, a powerful new open-source model for text-to-speech (TTS) applications. The 2-billion-parameter model is designed for high-fidelity audio generation and is released under the permissive Apache 2.0 license, allowing for broad commercial and research use.

The model's primary capability is zero-shot voice cloning. It can analyze a brief audio sample to capture the unique acoustic properties of a speaker—including timbre, rhythm, and prosody—and then generate new speech in that voice from any given text. This allows for the creation of dynamic, custom voice outputs without needing to train a new model for each speaker.

Technical Foundations

Dia2-2B is a diffusion-based model, a technique known for producing high-quality generative results. It was trained on a substantial dataset of over 200,000 hours of English speech sourced from public domain audiobooks. While building on foundational concepts from the Bark model, Dia2 features a distinct architecture and was trained on a completely new dataset.

This release provides developers with a strong, openly available tool for creating sophisticated audio applications. As an alternative to proprietary TTS and voice cloning APIs, Dia2-2B enables a new class of customizable products, from personalized digital assistants to dynamic content creation tools. The model is available for download and use from the Nari Labs Hugging Face repository.

Sources

nari-labs/Dia2-2B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Technical Foundations