Latest open-source Text → Speech models

Audio8/Text → Speech

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

Text → Speech

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Any-to-Any Reasoning

nineninesix/Text → Speech

Gepard 1.0 brings bilingual TTS with voice cloning

An open, autoregressive text-to-speech model targets English and Spanish under a permissive Apache 2.0 license.

Jun 22, 2026

Text → Speech

Owensong/Text → Speech

Resemble AI's Inflect-Nano-v1 puts TTS on local hardware

An ultra-small, experimental text-to-speech model arrives under Apache 2.0, aimed at running speech synthesis directly on local machines.

Jun 16, 2026

Text → Speech

Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026

Text → Speech

Bosonai/Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026

Text → Speech

Bosonai/Text → Speech

Higgs TTS 3 lands as a 4B multilingual speech model

Boson AI's new text-to-speech release aims for expressive, controllable voice synthesis across multiple languages.

Jun 4, 2026

Text → Speech

Bosonai/Text → Speech

Boson AI releases Higgs TTS v3, a 4B speech model

The new open-weights text-to-speech system targets expressive, controllable voice generation across multiple languages with built-in voice cloning.

Jun 4, 2026

Text → Speech

OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026

Text → Speech

MisoLabs/Text → Speech

MisoLabs Debuts MisoTTS, an Open Voice Model

The new text-to-speech system adapts the decoder-only architecture of language models like Llama to generate more natural-sounding speech.

May 21, 2026

Text → Speech

Supertone/Text → Speech

Supertone Releases On-Device Multilingual TTS Model

The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

May 6, 2026

Text → Speech

Resemble AI/Text → Speech

Resemble AI Releases Dramabox Voice Cloning TTS Model

The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.

Apr 17, 2026

Text → Speech

OpenBMB/Text → Speech

OpenBMB Releases VoxCPM2 for Expressive TTS

The new diffusion-based model from the OpenBMB research group supports multilingual speech, emotional control, and zero-shot voice cloning.

Apr 3, 2026

Text → Speech

OpenMOSS/Text → Speech

MOSS-TTS-Nano Delivers Multilingual Speech at 100M Params

The new open-source model from OpenMOSS-Team generates high-quality speech in multiple languages while maintaining a remarkably small footprint.

Apr 2, 2026

Text → Speech

KRAFTON/Any-to-Any

KRAFTON Releases 9B Bilingual Speech Model

The gaming giant behind 'PUBG' has released Raon-Speech-9B, a multimodal model for English and Korean speech recognition and synthesis.

Mar 30, 2026

Speech → Text Any-to-Any

k2-fsa/Text → Speech

OmniVoice TTS Offers Zero-Shot Multilingual Voice Cloning

A new open-source text-to-speech model from the k2-fsa project can replicate a voice and generate speech in multiple languages from a single short audio sample.

Mar 30, 2026

Text → Speech

HKUSTAudio/Any-to-Any

HKUST Releases Audio-Omni, a Unified Audio Model

The new diffusion-based model handles speech, music, and general audio tasks like conversion and editing within a single, versatile framework.

Mar 27, 2026

Any-to-Any Music

Aratako/Text → Speech

Irodori-TTS v2 Offers Open Japanese Speech Synthesis

The 500-million-parameter model from researcher Aratako provides a high-quality, single-speaker voice under a permissive MIT license.

Mar 23, 2026

Text → Speech

Fishaudio/Text → Speech

Fish Audio's S2-Pro Brings Expressive TTS to Open Source

The new text-to-speech model can follow natural language instructions to control tone, clone voices from short clips, and speak multiple languages.

Mar 9, 2026

Text → Speech

HumeAI/Text → Speech

Hume AI Releases 3B Multilingual Text-to-Speech Model

The new model, Tada-3B-ML, is designed for fine-grained control over vocal expression across more than 10 languages.

Feb 16, 2026

Text → Speech

nineninesix/Text → Speech

Kani-TTS-2 Offers New Open-Source Voice Generation

An independent researcher has released a new English text-to-speech model under a permissive license, built on a modern generative foundation.

Feb 12, 2026

Text → Speech

OpenMOSS/Text → Speech

MOSS-TTS: A New Multilingual Text-to-Speech Model

The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.

Feb 6, 2026

Text → Speech

Soul AILab/Music

Soul-AILab Releases Zero-Shot Singing Voice Model

The new model, SoulX-Singer, can replicate a singing voice from a short audio sample and supports both English and Chinese under a permissive license.

Feb 6, 2026

Music Text → Speech

YatharthS/Text → Speech

LuxTTS Delivers Lightweight, Open-Source Speech Synthesis

The new text-to-speech model is optimized for the ONNX runtime, making it a promising option for efficient, on-device audio generation.

Jan 22, 2026

Text → Speech

Qwen · Alibaba/Text → Speech

Qwen Releases Open-Source Voice Cloning Model

The new 600-million-parameter Qwen3-TTS model can generate speech in multiple languages and clone voices from short audio clips.

Jan 21, 2026

Text → Speech

Qwen · Alibaba/Text → Speech

Qwen Releases a Compact Custom-Voice TTS Model

The new 600-million-parameter model from Alibaba's Qwen team can clone voices from short audio clips for multilingual speech synthesis.

Jan 21, 2026

Text → Speech

Qwen · Alibaba/Text → Speech

Qwen Releases Open 1.7B Custom Voice Synthesis Model

Alibaba's Qwen team has released a new text-to-speech model capable of cloning voices from just a few seconds of audio.

Jan 21, 2026

Text → Speech

Qwen · Alibaba/Text → Speech

Qwen Unveils Open Model for Custom Voice Synthesis

The new 1.7-billion-parameter text-to-speech model from Alibaba's Qwen team can generate novel voices from short audio prompts.

Jan 21, 2026

Text → Speech

ekwek/Text → Speech

Soprano TTS Model Leverages Qwen3 Architecture

The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.

Jan 14, 2026

Text → Speech

HumeAI/Text → Speech

Hume AI Releases TADA 1B for Expressive Speech

The new 1-billion-parameter model combines a Llama 3.2 base with text-to-speech to generate more natural and nuanced audio.

Jan 12, 2026

Text → Speech

Kugelaudio/Text → Speech

OpenMOSS Releases KugelAudio for European Languages

The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.

Jan 11, 2026

Text → Speech

Supertone/Text → Speech

Supertone Open-Sources Supertonic 2 Voice Model

The new text-to-speech model from the audio AI company supports English, Korean, and Spanish and comes in the efficient ONNX format for deployment.

Jan 6, 2026

Text → Speech

Qwen · Alibaba/Any-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025

Speech → Text Any-to-Any

YatharthS/Text → Speech

MiraTTS Brings Qwen2 to Bilingual Speech Synthesis

A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.

Dec 17, 2025

Text → Speech

ekwek/Text → Speech

Soprano-80M: A Tiny TTS Model Based on Qwen3

Developer 'ekwek' has released a compact 80-million-parameter text-to-speech model, notable for its unconventional use of a Qwen3 language model architecture.

Dec 17, 2025

Text → Speech

Qwen · Alibaba/Text → Speech

Alibaba Releases CosyVoice 3 for Expressive TTS

The new 500-million-parameter text-to-speech model from the Qwen team offers multilingual voice cloning and emotional control.

Dec 11, 2025

Text → Speech

Zhipu AI/Text → Speech

Zhipu AI Releases GLM-TTS for Zero-Shot Voice Cloning

This new text-to-speech model can replicate a voice from just a few seconds of audio, using a novel combination of flow matching and reinforcement learning.

Dec 10, 2025

Text → Speech

OpenBMB/Text → Speech

VoxCPM 1.5 Brings Open-Source Voice Cloning

The new 500-million-parameter text-to-speech model from OpenBMB supports both English and Chinese and can replicate a voice from a short audio sample.

Dec 5, 2025

Text → Speech

Microsoft/Text → Speech

Microsoft Releases VibeVoice for Real-Time AI Speech

The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.

Dec 4, 2025

Text → Speech

Resemble AI/Text → Speech

Resemble AI Releases Chatterbox Turbo for Open TTS

The new text-to-speech model focuses on performance and offers voice cloning capabilities for English under a permissive MIT license.

Dec 2, 2025

Text → Speech

Mistral AI/Text → Speech

Mistral AI Releases Voxtral, an Open-Source TTS Model

The French AI leader expands beyond large language models with a new, 4-billion-parameter model for generating multilingual speech.

Nov 17, 2025

Text → Speech

Nari Labs/Text → Speech

Nari Labs Releases Dia2-2B, an Open Voice Cloning Model

The 2-billion-parameter text-to-speech model can clone voices from a short audio sample and is available under an Apache 2.0 license.

Nov 15, 2025

Text → Speech

Soul AILab/Text → Speech

SoulX-Podcast 1.7B Offers Open Multi-Speaker TTS

The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.

Oct 27, 2025

Text → Speech

Maya Research/Text → Speech

Maya Research Releases Maya1, an Expressive TTS Model

The new Apache 2.0 licensed model uses a Llama-based architecture to generate more natural and emotionally nuanced speech from text.

Oct 18, 2025

Text → Speech

nineninesix/Text → Speech

Kani TTS 370M Offers Compact Multilingual Speech

Based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture, the new model offers an efficient solution for developers.

Sep 30, 2025

Text → Speech

inclusionAI/Any-to-Any

Ming-UniAudio Brings MoE to Unified Audio AI

A new 16-billion-parameter model from inclusionAI uses a Mixture-of-Experts architecture to handle a wide range of audio tasks efficiently.

Sep 29, 2025

Speech → Text Any-to-Any

Qwen · Alibaba/Any-to-AnyMajor release

Qwen3-Omni Arrives With Any-to-Any Multimodality

The new 30B Mixture-of-Experts model from Alibaba's Qwen team can process and generate content across text, image, and audio formats.

Sep 20, 2025

Speech → Text Any-to-Any

Xiaomi/Any-to-Any

Xiaomi's MiMo-Audio 7B Tackles Complex Speech Tasks

This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.

Sep 18, 2025

Speech → Text Any-to-Any

OpenBMB/Text → Speech

OpenBMB Releases VoxCPM for Open Voice Synthesis

The new 500-million-parameter model offers high-quality text-to-speech and zero-shot voice cloning under a permissive license.

Sep 16, 2025

Text → Speech

Qwen · Alibaba/Any-to-Any

Qwen Releases 30B Model for Audio Captioning

The new Mixture-of-Experts model from Alibaba is fine-tuned to generate detailed, multilingual descriptions for complex audio content.

Sep 15, 2025

Any-to-Any Text → Speech

neuphonic/Text → Speech

Neuphonic Releases NeuTTS Air for On-Device AI Speech

The new Apache 2.0 text-to-speech model is built on a Qwen2 architecture and optimized for local inference with GGUF support.

Sep 15, 2025

Text → Speech

Vibevoice/Text → Speech

Microsoft Releases VibeVoice, a 7B Podcast TTS Model

The new 7-billion-parameter model is designed for generating long-form, multi-speaker audio in English and Chinese under a permissive MIT license.

Sep 4, 2025

Text → Speech

Aoi Ot/Text → Speech

Microsoft Releases VibeVoice, a Podcast-Ready TTS Model

The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.

Sep 4, 2025

Text → Speech

StepFun/Any-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025

Speech → Text Any-to-Any

Microsoft/Text → Speech

Microsoft Releases VibeVoice for Long-Form Audio

The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.

Aug 25, 2025

Text → Speech

Bosonai/Text → Speech

Boson AI Releases Higgs Audio v2 for Expressive TTS

The new 3-billion-parameter model focuses on generating expressive, multilingual speech and is fully open for commercial use under an Apache 2.0 license.

Jul 1, 2025

Text → Speech

Kyutai/Text → Speech

Kyutai Releases 1.6B Bilingual TTS Model

The French AI lab's new open-source model generates streaming audio in English and French under a permissive license.

Jun 30, 2025

Text → Speech

Maya Research/Text → Speech

Veena TTS Model Targets Indian Languages with Llama Base

Maya Research has released a 3-billion-parameter model designed to generate natural-sounding speech in Hindi and English.

Jun 24, 2025

Text → Speech

Latest Text → Speech models