YatharthSText → Speech

MiraTTS Brings Qwen2 to Bilingual Speech Synthesis

A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.

Dec 17, 2025

UpdateCC BY-NC 4.0

A new open-source text-to-speech model named MiraTTS has been released by the OpenMOSS team. The model is notable for its bilingual capabilities, generating speech in both English and Chinese, and for its foundation in the powerful Qwen2 language model architecture.

MiraTTS employs a two-stage design for speech synthesis. The first stage uses the Qwen2-based model to convert input text into a spectrogram—a visual representation of sound frequencies. This spectrogram is then passed to a separate vocoder model to produce the final audio waveform.

From Research to Application

The model's architecture offers practical advantages for developers. MiraTTS supports exporting to the ONNX format, a standard that enables efficient inference across a wide range of hardware and platforms. This focus on deployment readiness, combined with its bilingual support, makes it a useful component for various non-commercial applications.

The model is available for experimentation on its Hugging Face repository, released under a Creative Commons Non-Commercial license (CC BY-NC 4.0) that encourages academic and personal use.

Sources

YatharthS/MiraTTS
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

From Research to Application

The model is available for experimentation on its Hugging Face repository, released under a Creative Commons Non-Commercial license (CC BY-NC 4.0) that encourages academic and personal use.