OpenMOSS Releases KugelAudio for European Languages
The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.
OpenMOSS has introduced KugelAudio-0-open, a new text-to-speech (TTS) model designed to generate high-quality audio in several European languages. This release provides a new tool for researchers and developers working on multilingual voice applications.
A Hybrid Approach to Synthesis
KugelAudio employs a two-stage architecture that combines popular techniques in audio generation. First, a diffusion-based model converts input text into a mel-spectrogram, a visual representation of sound frequencies. This spectrogram is then fed into an autoregressive vocoder, which synthesizes the final audio waveform. The model was trained on an internal dataset of over 20,000 hours of multilingual audio.
The model's explicit focus on European languages addresses a notable gap in the open-source landscape. While many high-performance TTS systems excel at English, generating natural-sounding speech for languages like German, French, or Polish with the same level of quality remains a challenge. KugelAudio aims to provide a strong baseline for these and other languages.
The model is available for download from the Hugging Face Hub. It is released under the KugelAudio Research License Agreement, which restricts its use to non-commercial research purposes. This makes it a valuable resource for academic exploration rather than for direct integration into commercial products.
Sources
- Visit
kugelaudio/kugelaudio-0-open
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.