SupertoneText → Speech

Supertone Releases On-Device Multilingual TTS Model

The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

May 6, 2026

NotableOther

AI audio company Supertone has released Supertonic 3, a new text-to-speech (TTS) model designed for generating high-quality speech directly on user devices. The model aims to provide a robust solution for applications that require local, low-latency voice synthesis without relying on cloud APIs.

Supertonic 3 is notable for its multilingual capabilities, supporting English, Korean, Japanese, Chinese, Spanish, French, and German. To maximize compatibility and performance on edge hardware, Supertone has released the model in the Open Neural Network Exchange (ONNX) format. This allows developers to integrate it more easily into a wide range of applications and platforms. Technical details and code samples are available on the project's Hugging Face repository.

Why it matters

By focusing on on-device inference, Supertonic 3 addresses key concerns around privacy, cost, and latency. Running locally means user data doesn't need to be sent to a server, and the application can function without an internet connection. This makes it suitable for mobile apps, embedded systems, and other scenarios where self-contained AI is a priority.

The model is available under a Creative Commons Attribution-NonCommercial 4.0 license. While this makes it accessible for academic research and personal projects, it cannot be used in commercial products without a separate agreement. This choice positions Supertonic 3 as a valuable tool for the open-source community while protecting the company's commercial interests.

Sources

Supertone/supertonic-3
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Why it matters