ekwekText → Speech

Soprano-80M: A Tiny TTS Model Based on Qwen3

Developer 'ekwek' has released a compact 80-million-parameter text-to-speech model, notable for its unconventional use of a Qwen3 language model architecture.

Dec 17, 2025

UpdateApache 2.0

The field of open-source voice generation has a new and intriguing entry with the release of Soprano-80M, a text-to-speech (TTS) model with just 80 million parameters. Its small size makes it a compelling option for applications where computational resources are limited.

What sets Soprano-80M apart is its technical foundation. The model is built upon the Qwen3 language model architecture, an unusual choice for a TTS system that highlights the versatility of modern LLM backbones. By adapting a powerful language model for audio synthesis, the project explores an alternative path to generating high-quality speech.

The model's compact footprint is its key advantage. Small, efficient models like Soprano-80M are critical for enabling on-device or edge computing applications, from smart assistants to accessibility tools, without relying on cloud-based APIs. This lowers the barrier to entry for developers and researchers experimenting with voice synthesis.

Developer 'ekwek' has released the model under the permissive Apache 2.0 license, encouraging broad adoption for both research and commercial use. The complete model, along with instructions for getting started, is available now on its Hugging Face repository.

Sources

ekwek/Soprano-80M
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026