SupertoneText → Speech

Supertone Open-Sources Supertonic 2 Voice Model

The new text-to-speech model from the audio AI company supports English, Korean, and Spanish and comes in the efficient ONNX format for deployment.

Jan 6, 2026

NotableOpenRAIL-M

Audio AI company Supertone has released Supertonic 2, a new open-source model for generating high-quality speech from text. The model stands out for its multilingual capabilities, with initial support for English, Korean, and Spanish, among others. This release adds another strong contender to the growing ecosystem of open text-to-speech (TTS) systems.

Unlike many research-focused releases, Supertonic 2 is distributed in the ONNX (Open Neural Network Exchange) format. This makes it easier for developers to integrate and run the model efficiently across different platforms and hardware, signaling that it was designed with practical application in mind.

Developers can find the model and usage instructions on the official Hugging Face repository. Supertonic 2 is released under an OpenRAIL license, a popular choice for AI models that permits commercial use but includes restrictions against certain harmful applications, aligning with responsible AI development practices.

The availability of high-quality, multilingual, and deployment-ready TTS models is a critical step for building more accessible and global applications. Supertonic 2 provides a valuable new tool for developers looking to integrate voice into their products without relying on proprietary, closed-source APIs.

Sources

Supertone/supertonic-2
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026