OpenBMBAny-to-Any

MiniCPM-o 4.5 Offers 'Any-to-Any' Multimodal AI

The new model from OpenBMB supports mixed-modality inputs and outputs, from text and images to audio and video, in a single efficient package.

Feb 2, 2026

NotableApache 2.0

OpenBMB has released MiniCPM-o 4.5, a new multimodal model that pushes the boundaries of how AI can interact with different types of data. Unlike many vision-language models that primarily accept images and text to produce only text, MiniCPM-o is designed for 'any-to-any' communication.

This means the model can process a mix of inputs—such as text, images, and audio—and generate a combination of outputs in a single turn. The project describes this capability as 'full-duplex,' enabling more dynamic and complex interactions than traditional request-and-response models. This approach opens the door for more sophisticated conversational agents and creative tools.

The release is available in the popular GGUF format, which is significant for developers and hobbyists. GGUF allows large models to run efficiently on consumer-grade hardware, including CPUs and GPUs, lowering the barrier to entry for experimenting with advanced multimodal AI. You can find the model files and further details on the Hugging Face repository.

Released under the permissive Apache 2.0 license, MiniCPM-o 4.5 provides a powerful new building block for applications that require a deeper, more integrated understanding and generation of multiple media types.

Sources

openbmb/MiniCPM-o-4_5-gguf
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

OpenBMBAny-to-Any

MiniCPM-o 4.5 Offers 'Any-to-Any' Multimodal AI

The new model from OpenBMB supports mixed-modality inputs and outputs, from text and images to audio and video, in a single efficient package.

Feb 2, 2026

NotableApache 2.0

Sources

openbmb/MiniCPM-o-4_5-gguf
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026