Company

OpenMOSS

7 models

CategoriesVision-Language Text → Speech Speech → Text Any-to-Any Image → Video

Releases

OpenMOSS/Vision-Language

OpenMOSS Debuts MOSS-VL-Realtime for Live Video

The Chinese research group's new vision-language model targets streaming understanding of video and images rather than static frames.

Jul 14, 2026

Vision-Language Any-to-Any

OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026

Text → Speech

OpenMOSS/Speech → Text

OpenMOSS Releases Transcribe-Diarize ASR Model

The open-weights team behind MOSS turns to long-form speech recognition with built-in speaker diarization and timestamps.

May 19, 2026

Speech → Text

OpenMOSS/Text → Speech

MOSS-TTS-Nano Delivers Multilingual Speech at 100M Params

The new open-source model from OpenMOSS-Team generates high-quality speech in multiple languages while maintaining a remarkably small footprint.

Apr 2, 2026

Text → Speech

OpenMOSS/Text → Speech

MOSS-TTS: A New Multilingual Text-to-Speech Model

The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.

Feb 6, 2026

Text → Speech

OpenMOSS/Any-to-Any

OpenMOSS Releases MOVA, a 720p Multimodal Video Generator

The new open model can generate high-definition video with synchronized audio from a flexible combination of text and image prompts.

Jan 28, 2026

Any-to-Any Image → Video

OpenMOSS/Image → Video

OpenMOSS Releases MOVA for Joint Video and Audio Gen

The new model generates 360p video from text or images and creates corresponding audio tracks simultaneously, a notable step for integrated audiovisual synthesis.

Jan 28, 2026

Image → Video Text → Video