ekwekText → Speech

Soprano TTS Model Leverages Qwen3 Architecture

The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.

Jan 14, 2026

UpdateApache 2.0

A new open-source model for generating speech from text has been released by a group called OpenMOSS. Named Soprano-1.1-80M, the model is exceptionally compact at just 80 million parameters and is available under the permissive Apache 2.0 license.

What sets Soprano apart is its foundation. The model adapts the architecture of Qwen3, a family of models primarily known for large-scale text generation. Applying a modern large language model (LLM) architecture to the specialized task of text-to-speech (TTS) represents an increasingly common strategy for leveraging the power of these advanced designs across different modalities.

The model's small size is a significant advantage, making it suitable for developers who need to run speech synthesis on consumer-grade hardware or in resource-constrained environments. By combining this efficiency with an open license, Soprano lowers the barrier for integrating custom, high-quality voice generation into a wide range of applications, from accessibility tools to creative projects.

Soprano-1.1-80M is available for download and experimentation now from the Hugging Face Hub.

Sources

ekwek/Soprano-1.1-80M
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026