Microsoft Releases VibeVoice for Speech Transcription
The new open-source automatic speech recognition model handles multilingual transcription and speaker identification out of the box.
Company
Releases
The new open-source automatic speech recognition model handles multilingual transcription and speaker identification out of the box.
The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.
The 7-billion-parameter model is designed to understand and interact with graphical user interfaces, building on Alibaba's open-source Qwen2.5-VL.
The new 7-billion-parameter model is designed for generating long-form, multi-speaker audio in English and Chinese under a permissive MIT license.
The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.
The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.