KyutaiText → Speech

Kyutai Releases 1.6B Bilingual TTS Model

The French AI lab's new open-source model generates streaming audio in English and French under a permissive license.

Jun 30, 2025

NotableCC BY 4.0

French AI research lab Kyutai has released a new 1.6-billion-parameter text-to-speech (TTS) model that generates high-quality audio in both English and French. The model is available under a permissive Creative Commons license (CC-BY-4.0), allowing for broad use, including in commercial applications.

The system is built on what the team calls the Moshi stack, an adaptation of the VoiceCraft architecture. A key feature is its ability to support streaming audio generation, making it suitable for real-time applications where low latency is critical, such as conversational agents or live narration tools.

A New Voice in Open TTS

While the open-source landscape for large language models has exploded, the space for high-quality, permissively licensed TTS models is less crowded. Kyutai's release provides a significant new building block for developers creating voice-enabled products without relying on proprietary, closed-source APIs.

As a non-profit lab, Kyutai's contribution adds valuable diversity to an ecosystem often dominated by a few large tech companies. Developers can access the model weights, explore an interactive demo, and find usage examples on the official Hugging Face repository.

Sources

kyutai/tts-1.6b-en_fr
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

A New Voice in Open TTS