MOSS-TTS Aims for More Robust Speech Synthesis
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.
Company
Releases
A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.
The new open-source model from OpenMOSS-Team generates high-quality speech in multiple languages while maintaining a remarkably small footprint.
The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.
The new open model can generate high-definition video with synchronized audio from a flexible combination of text and image prompts.
The new model generates 360p video from text or images and creates corresponding audio tracks simultaneously, a notable step for integrated audiovisual synthesis.
The new text-to-speech model is optimized for the ONNX runtime, making it a promising option for efficient, on-device audio generation.
The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.
The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.
A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.
The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.