AratakoText → Speech

Irodori-TTS v2 Offers Open Japanese Speech Synthesis

The 500-million-parameter model from researcher Aratako provides a high-quality, single-speaker voice under a permissive MIT license.

Mar 23, 2026

UpdateMIT

Independent AI researcher Aratako has released the second version of Irodori-TTS, a 500-million-parameter text-to-speech model designed for the Japanese language. The project provides a capable and efficient tool for generating a female voice, filling a need for high-quality, open-source models in languages other than English.

The model is built on the VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) architecture, a popular framework known for producing natural-sounding speech. It was trained on the JSUT dataset, a public Japanese speech corpus from a single female speaker. The full model is now available on the Hugging Face Hub for community use.

By releasing Irodori-TTS under the permissive MIT license, Aratako enables developers and researchers to freely integrate Japanese speech synthesis into their applications or use it as a foundation for further research. While many advanced text-to-speech systems remain proprietary, this release underscores the vital role of independent contributors in building a more open and multilingual AI ecosystem.

Sources

Aratako/Irodori-TTS-500M-v2
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026