MiraTTS Brings Qwen2 to Bilingual Speech Synthesis
A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.
A new open-source text-to-speech model named MiraTTS has been released by the OpenMOSS team. The model is notable for its bilingual capabilities, generating speech in both English and Chinese, and for its foundation in the powerful Qwen2 language model architecture.
MiraTTS employs a two-stage design for speech synthesis. The first stage uses the Qwen2-based model to convert input text into a spectrogram—a visual representation of sound frequencies. This spectrogram is then passed to a separate vocoder model to produce the final audio waveform.
From Research to Application
The model's architecture offers practical advantages for developers. MiraTTS supports exporting to the ONNX format, a standard that enables efficient inference across a wide range of hardware and platforms. This focus on deployment readiness, combined with its bilingual support, makes it a useful component for various non-commercial applications.
The model is available for experimentation on its Hugging Face repository, released under a Creative Commons Non-Commercial license (CC BY-NC 4.0) that encourages academic and personal use.
Sources
- Visit
YatharthS/MiraTTS
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.