Irodori-TTS v2 Offers Open Japanese Speech Synthesis
The 500-million-parameter model from researcher Aratako provides a high-quality, single-speaker voice under a permissive MIT license.
Independent AI researcher Aratako has released the second version of Irodori-TTS, a 500-million-parameter text-to-speech model designed for the Japanese language. The project provides a capable and efficient tool for generating a female voice, filling a need for high-quality, open-source models in languages other than English.
The model is built on the VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) architecture, a popular framework known for producing natural-sounding speech. It was trained on the JSUT dataset, a public Japanese speech corpus from a single female speaker. The full model is now available on the Hugging Face Hub for community use.
By releasing Irodori-TTS under the permissive MIT license, Aratako enables developers and researchers to freely integrate Japanese speech synthesis into their applications or use it as a foundation for further research. While many advanced text-to-speech systems remain proprietary, this release underscores the vital role of independent contributors in building a more open and multilingual AI ecosystem.
Sources
- Visit
Aratako/Irodori-TTS-500M-v2
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.