Kani TTS 370M Offers Compact Multilingual Speech
Based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture, the new model offers an efficient solution for developers.
A new, efficient text-to-speech model called Kani TTS 370M has been released on the Hugging Face Hub. Developed by the user nineninesix, the model contains 370 million parameters, offering a relatively lightweight option for generating high-quality, multilingual speech.
The model is based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture. This approach allows it to handle multiple languages without relying on explicit language identification tags during training or inference. This design choice can make a model more flexible and scalable for diverse linguistic applications, learning to synthesize different languages from a mixed dataset.
Kani TTS 370M's compact size is its most notable feature. In a field often dominated by multi-billion parameter models, a smaller footprint makes it more accessible for researchers and developers with limited computational resources. This could enable its use in on-device applications or lower-cost cloud deployments where efficiency is a primary concern.
The model weights and usage instructions are publicly available on its Hugging Face repository. While the weights are accessible, the license is listed as "All rights reserved," indicating that it is not intended for commercial use without permission from the creator.
Sources
- Visit
nineninesix/kani-tts-370m
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.