The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

Category · audio

Latest Text → Speech models

The newest open-source Text → Speech releases, from across the ecosystem.

Filter

35 releases

Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026
Text → Speech
Zonos 2
Zonos 2
Boson AIText → Speech
/

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026
Text → Speech
Higgs Audio v3 TTS 4B
Higgs Audio v3 TTS 4B
Supertone/Text → Speech

Supertone Releases On-Device Multilingual TTS Model

The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

May 6, 2026
Text → Speech
Supertonic 3
Supertonic 3
Resemble AI/Text → Speech

Resemble AI Releases Dramabox Voice Cloning TTS Model

The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.

Apr 17, 2026
Text → Speech
Dramabox TTS
Dramabox TTS
OpenAI/Text → Speech

Hume AI Releases 3B Multilingual Text-to-Speech Model

The new model, Tada-3B-ML, is designed for fine-grained control over vocal expression across more than 10 languages.

Feb 16, 2026
Text → Speech
Tada-3B-ML
Tada-3B-ML
OpenAI/Text → Speech

Kani-TTS-2 Offers New Open-Source Voice Generation

An independent researcher has released a new English text-to-speech model under a permissive license, built on a modern generative foundation.

Feb 12, 2026
Text → Speech
Kani-TTS-2 (English)
Kani-TTS-2 (English)
OpenMOSS/Text → Speech

MOSS-TTS: A New Multilingual Text-to-Speech Model

The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.

Feb 6, 2026
Text → Speech
MOSS-TTS
MOSS-TTS
OpenAI/Music

Soul-AILab Releases Zero-Shot Singing Voice Model

The new model, SoulX-Singer, can replicate a singing voice from a short audio sample and supports both English and Chinese under a permissive license.

Feb 6, 2026
MusicText → Speech
SoulX-Singer
SoulX-Singer
OpenMOSS/Text → Speech

LuxTTS Delivers Lightweight, Open-Source Speech Synthesis

The new text-to-speech model is optimized for the ONNX runtime, making it a promising option for efficient, on-device audio generation.

Jan 22, 2026
Text → Speech
LuxTTS
LuxTTS
Qwen · Alibaba/Text → Speech

Qwen Releases Open-Source Voice Cloning Model

The new 600-million-parameter Qwen3-TTS model can generate speech in multiple languages and clone voices from short audio clips.

Jan 21, 2026
Text → Speech
Qwen3-TTS 0.6B Base
Qwen3-TTS 0.6B Base
Qwen · Alibaba/Text → Speech

Qwen Releases a Compact Custom-Voice TTS Model

The new 600-million-parameter model from Alibaba's Qwen team can clone voices from short audio clips for multilingual speech synthesis.

Jan 21, 2026
Text → Speech
Qwen3-TTS-12Hz-0.6B CustomVoice
Qwen3-TTS-12Hz-0.6B CustomVoice
OpenMOSS/Text → Speech

Soprano TTS Model Leverages Qwen3 Architecture

The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.

Jan 14, 2026
Text → Speech
Soprano-1.1-80M
Soprano-1.1-80M
OpenAI/Text → Speech

Hume AI Releases TADA 1B for Expressive Speech

The new 1-billion-parameter model combines a Llama 3.2 base with text-to-speech to generate more natural and nuanced audio.

Jan 12, 2026
Text → Speech
TADA 1B
TADA 1B
OpenMOSS/Text → Speech

OpenMOSS Releases KugelAudio for European Languages

The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.

Jan 11, 2026
Text → Speech
KugelAudio-0-open
KugelAudio-0-open
Supertone/Text → Speech

Supertone Open-Sources Supertonic 2 Voice Model

The new text-to-speech model from the audio AI company supports English, Korean, and Spanish and comes in the efficient ONNX format for deployment.

Jan 6, 2026
Text → Speech
Supertonic 2
Supertonic 2
Qwen · Alibaba/Any-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025
Speech → TextAny-to-Any
Fun-Audio-Chat-8B
Fun-Audio-Chat-8B
OpenMOSS/Text → Speech

MiraTTS Brings Qwen2 to Bilingual Speech Synthesis

A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.

Dec 17, 2025
Text → Speech
MiraTTS
MiraTTS
Microsoft/Text → Speech

Microsoft Releases VibeVoice for Real-Time AI Speech

The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.

Dec 4, 2025
Text → Speech
VibeVoice Realtime 0.5B
VibeVoice Realtime 0.5B
OpenAI/Text → Speech

Nari Labs Releases Dia2-2B, an Open Voice Cloning Model

The 2-billion-parameter text-to-speech model can clone voices from a short audio sample and is available under an Apache 2.0 license.

Nov 15, 2025
Text → Speech
Dia2-2B
Dia2-2B
OpenMOSS/Text → Speech

SoulX-Podcast 1.7B Offers Open Multi-Speaker TTS

The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.

Oct 27, 2025
Text → Speech
SoulX-Podcast 1.7B
SoulX-Podcast 1.7B
Maya Research/Text → Speech

Maya Research Releases Maya1, an Expressive TTS Model

The new Apache 2.0 licensed model uses a Llama-based architecture to generate more natural and emotionally nuanced speech from text.

Oct 18, 2025
Text → Speech
Maya1
Maya1
nineninesix/Text → Speech

Kani TTS 370M Offers Compact Multilingual Speech

Based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture, the new model offers an efficient solution for developers.

Sep 30, 2025
Text → Speech
Kani TTS 370M
Kani TTS 370M
inclusionAI/Any-to-Any

Ming-UniAudio Brings MoE to Unified Audio AI

A new 16-billion-parameter model from inclusionAI uses a Mixture-of-Experts architecture to handle a wide range of audio tasks efficiently.

Sep 29, 2025
Speech → TextAny-to-Any
Ming-UniAudio-16B-A3B
Ming-UniAudio-16B-A3B
Qwen · Alibaba/Any-to-AnyMajor release

Qwen3-Omni Arrives With Any-to-Any Multimodality

The new 30B Mixture-of-Experts model from Alibaba's Qwen team can process and generate content across text, image, and audio formats.

Sep 20, 2025
Speech → TextAny-to-Any
Qwen3-Omni-30B-A3B-Instruct
Qwen3-Omni-30B-A3B-Instruct
Xiaomi/Any-to-Any

Xiaomi's MiMo-Audio 7B Tackles Complex Speech Tasks

This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.

Sep 18, 2025
Speech → TextAny-to-Any
MiMo-Audio-7B-Instruct
MiMo-Audio-7B-Instruct
OpenBMB/Text → Speech

OpenBMB Releases VoxCPM for Open Voice Synthesis

The new 500-million-parameter model offers high-quality text-to-speech and zero-shot voice cloning under a permissive license.

Sep 16, 2025
Text → Speech
VoxCPM-0.5B
VoxCPM-0.5B
Qwen · Alibaba/Any-to-Any

Qwen Releases 30B Model for Audio Captioning

The new Mixture-of-Experts model from Alibaba is fine-tuned to generate detailed, multilingual descriptions for complex audio content.

Sep 15, 2025
Any-to-AnyText → Speech
Qwen3-Omni-30B-A3B-Captioner
Qwen3-Omni-30B-A3B-Captioner
neuphonic/Text → Speech

Neuphonic Releases NeuTTS Air for On-Device AI Speech

The new Apache 2.0 text-to-speech model is built on a Qwen2 architecture and optimized for local inference with GGUF support.

Sep 15, 2025
Text → Speech
NeuTTS Air
NeuTTS Air
Microsoft/Text → Speech

Microsoft Releases VibeVoice, a 7B Podcast TTS Model

The new 7-billion-parameter model is designed for generating long-form, multi-speaker audio in English and Chinese under a permissive MIT license.

Sep 4, 2025
Text → Speech
VibeVoice-7B
VibeVoice-7B
Microsoft/Text → Speech

Microsoft Releases VibeVoice, a Podcast-Ready TTS Model

The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.

Sep 4, 2025
Text → Speech
VibeVoice Large
VibeVoice Large
StepFun/Any-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025
Speech → TextAny-to-Any
Step-Audio 2 mini
Step-Audio 2 mini
Microsoft/Text → Speech

Microsoft Releases VibeVoice for Long-Form Audio

The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.

Aug 25, 2025
Text → Speech
VibeVoice-1.5B
VibeVoice-1.5B
Boson AI/Text → Speech

Boson AI Releases Higgs Audio v2 for Expressive TTS

The new 3-billion-parameter model focuses on generating expressive, multilingual speech and is fully open for commercial use under an Apache 2.0 license.

Jul 1, 2025
Text → Speech
Higgs Audio v2 (3B)
Higgs Audio v2 (3B)
Kyutai/Text → Speech

Kyutai Releases 1.6B Bilingual TTS Model

The French AI lab's new open-source model generates streaming audio in English and French under a permissive license.

Jun 30, 2025
Text → Speech
Kyutai TTS 1.6B (en/fr)
Kyutai TTS 1.6B (en/fr)
Maya Research/Text → Speech

Veena TTS Model Targets Indian Languages with Llama Base

Maya Research has released a 3-billion-parameter model designed to generate natural-sounding speech in Hindi and English.

Jun 24, 2025
Text → Speech
Veena
Veena