Supertone Releases On-Device Multilingual TTS Model
The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

AI audio company Supertone has released Supertonic 3, a new text-to-speech (TTS) model designed for generating high-quality speech directly on user devices. The model aims to provide a robust solution for applications that require local, low-latency voice synthesis without relying on cloud APIs.
Supertonic 3 is notable for its multilingual capabilities, supporting English, Korean, Japanese, Chinese, Spanish, French, and German. To maximize compatibility and performance on edge hardware, Supertone has released the model in the Open Neural Network Exchange (ONNX) format. This allows developers to integrate it more easily into a wide range of applications and platforms. Technical details and code samples are available on the project's Hugging Face repository.
Why it matters
By focusing on on-device inference, Supertonic 3 addresses key concerns around privacy, cost, and latency. Running locally means user data doesn't need to be sent to a server, and the application can function without an internet connection. This makes it suitable for mobile apps, embedded systems, and other scenarios where self-contained AI is a priority.
The model is available under a Creative Commons Attribution-NonCommercial 4.0 license. While this makes it accessible for academic research and personal projects, it cannot be used in commercial products without a separate agreement. This choice positions Supertonic 3 as a valuable tool for the open-source community while protecting the company's commercial interests.
Sources
- Visit
Supertone/supertonic-3
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Speech
Zyphra Releases Open-Source Zonos 2 TTS Model
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.