Kani-TTS-2 Offers New Open-Source Voice Generation
An independent researcher has released a new English text-to-speech model under a permissive license, built on a modern generative foundation.
A new, permissively licensed text-to-speech (TTS) model called Kani-TTS-2 has been released by the independent researcher known as 'nineninesix'. The English-language model is now available for download and use on the Hugging Face Hub, providing another option for developers seeking open-source voice generation tools.
Technically, Kani-TTS-2 is built on a Latent Flow Matching 2 (LFM2) backbone. This positions the model within a modern class of generative techniques that aim for efficient and high-quality synthesis. While specific performance benchmarks were not provided with the release, its foundation suggests a contemporary approach to audio generation.
Why It Matters
The release of Kani-TTS-2 contributes to a growing ecosystem of open audio models. With its Apache 2.0 license, developers can freely use, modify, and integrate the model into both research and commercial applications. This stands in contrast to the many proprietary, API-gated TTS services, offering greater flexibility and control for building voice-enabled products.
You can explore the model, listen to samples, and find implementation details in the official Kani-TTS-2 repository on Hugging Face.
Sources
- Visit
nineninesix/kani-tts-2-en
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.