BaiduVision-Language

Baidu Releases Open Vision-Language MoE Model

The new ERNIE 4.5 VL model brings advanced multimodal reasoning to the open-source community with an efficient Mixture-of-Experts architecture.

Nov 7, 2025

NotableApache 2.0

Chinese tech giant Baidu has released ERNIE 4.5 VL, a powerful new vision-language model available under a permissive open-source license. The model is designed for complex reasoning tasks that require understanding both images and text, positioning it as a capable new entry in the open multimodal space.

Efficient by Design

At its core, ERNIE 4.5 VL is a sparse Mixture-of-Experts (MoE) model. While it contains a total of 28 billion parameters, it only activates a fraction—around 3 billion—for any given inference task. This design, hinted at by the 'A3B' (Active 3 Billion) in its name, aims to provide the power of a much larger model while maintaining greater computational efficiency during use.

The model's full name, ERNIE 4.5 VL 28B A3B Thinking, emphasizes its focus on multi-step reasoning. It's built to analyze visual information and perform logical thinking, a challenging frontier for AI development.

By releasing this model under the Apache 2.0 license, Baidu is making a notable contribution to the open-source ecosystem. This gives researchers and developers a sophisticated, efficient, and freely usable foundation for building the next generation of multimodal applications.

Sources

baidu/ERNIE-4.5-VL-28B-A3B-Thinking
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026