OpenBMB Releases MiniCPM-V for On-Device Vision
The new open-source vision-language model is designed for high-resolution image understanding on mobile and edge devices.
Company
Releases
The new open-source vision-language model is designed for high-resolution image understanding on mobile and edge devices.
The new diffusion-based model from the OpenBMB research group supports multilingual speech, emotional control, and zero-shot voice cloning.
The new MiniCPM-o 4.5 model from the open-source research group can process and generate interleaved combinations of images, text, and audio.
The new model from OpenBMB supports mixed-modality inputs and outputs, from text and images to audio and video, in a single efficient package.
The new open-source model from OpenBMB uses a diffusion-based architecture to generate expressive video from a single still image.
The new 500-million-parameter text-to-speech model from OpenBMB supports both English and Chinese and can replicate a voice from a short audio sample.
The new 500-million-parameter model offers high-quality text-to-speech and zero-shot voice cloning under a permissive license.
The new vision-language model from the open-source research group demonstrates strong OCR and video understanding capabilities in a small package.