OpenBMBText → Speech

VoxCPM 1.5 Brings Open-Source Voice Cloning

The new 500-million-parameter text-to-speech model from OpenBMB supports both English and Chinese and can replicate a voice from a short audio sample.

Dec 5, 2025

NotableApache 2.0

The field of open-source text-to-speech has a new contender with the release of VoxCPM 1.5 by the OpenBMB research group. The model introduces high-quality, zero-shot voice cloning capabilities to the community, enabling users to generate speech in a specific voice using just a short audio sample.

Built on the MiniCPM-4 architecture, VoxCPM 1.5 is a compact 500-million-parameter model. Its relatively small size makes it more accessible for developers and researchers to run and fine-tune on a wider range of hardware, lowering the barrier to entry for creating custom speech applications.

Bilingual Voice Synthesis

A key advantage of the model is its bilingual nature, supporting both English and Chinese within a single framework. This, combined with its permissive Apache 2.0 license, makes it a versatile tool for global applications. Key features include:

Zero-shot voice cloning from brief audio clips.
Bilingual support for English and Chinese.
An efficient 500M parameter architecture.

By providing an open and powerful tool for voice synthesis, OpenBMB is enabling new possibilities in areas like personalized digital assistants, accessible technology, and creative content generation. Developers can explore the model and its capabilities in the official Hugging Face repository.

Sources

openbmb/VoxCPM1.5
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Bilingual Voice Synthesis

Zero-shot voice cloning from brief audio clips.

Bilingual support for English and Chinese.

An efficient 500M parameter architecture.