KugelaudioText → Speech

OpenMOSS Releases KugelAudio for European Languages

The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.

Jan 11, 2026

UpdateOther

OpenMOSS has introduced KugelAudio-0-open, a new text-to-speech (TTS) model designed to generate high-quality audio in several European languages. This release provides a new tool for researchers and developers working on multilingual voice applications.

A Hybrid Approach to Synthesis

KugelAudio employs a two-stage architecture that combines popular techniques in audio generation. First, a diffusion-based model converts input text into a mel-spectrogram, a visual representation of sound frequencies. This spectrogram is then fed into an autoregressive vocoder, which synthesizes the final audio waveform. The model was trained on an internal dataset of over 20,000 hours of multilingual audio.

The model's explicit focus on European languages addresses a notable gap in the open-source landscape. While many high-performance TTS systems excel at English, generating natural-sounding speech for languages like German, French, or Polish with the same level of quality remains a challenge. KugelAudio aims to provide a strong baseline for these and other languages.

The model is available for download from the Hugging Face Hub. It is released under the KugelAudio Research License Agreement, which restricts its use to non-commercial research purposes. This makes it a valuable resource for academic exploration rather than for direct integration into commercial products.

Sources

kugelaudio/kugelaudio-0-open
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

A Hybrid Approach to Synthesis