SoulX-Podcast 1.7B Offers Open Multi-Speaker TTS
The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.

OpenMOSS has introduced SoulX-Podcast 1.7B, a new open-source model designed to generate natural, conversational audio. Released under an Apache 2.0 license, the 1.7 billion-parameter text-to-speech (TTS) system is engineered specifically for creating podcast-style interactions with multiple speakers.
Built upon the Qwen3 architecture, SoulX-Podcast is tailored for both English and Chinese, making it a versatile tool for bilingual applications. According to the project's release materials, the model was trained to capture the nuances of human dialogue, aiming to produce audio that is more dynamic and engaging than standard single-speaker TTS outputs.
The release represents a growing interest in more sophisticated open-source audio generation. While many TTS models excel at reading prepared text, high-quality multi-speaker conversational models are less common. SoulX-Podcast could enable developers to build more realistic AI agents, create dynamic audio content, or prototype new forms of interactive storytelling without relying on proprietary APIs.
Sources
- Visit
Soul-AILab/SoulX-Podcast-1.7B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.