Zhipu AIVision-Language

Zhipu AI Releases Open Vision Model GLM-4.5V

The new Mixture-of-Experts model offers strong multimodal reasoning capabilities under a permissive MIT license.

Aug 10, 2025

Major releaseMIT

Chinese AI firm Zhipu AI has released GLM-4.5V, a new open-source vision-language model (VLM). The model, which uses a Mixture-of-Experts (MoE) architecture, is designed for sophisticated tasks that require understanding and reasoning about both text and images simultaneously.

According to the release notes, GLM-4.5V is built upon the company's GLM-4.5-Air-Base model. The key advancement is its capacity for what Zhipu AI describes as strong multimodal reasoning. This makes it suitable for complex applications like detailed image analysis, visual question answering, and generating text grounded in visual information. The model weights and code are available now on Hugging Face.

Why it matters

The release is significant for two main reasons. First, it adds a powerful, openly accessible VLM to the ecosystem, a domain where proprietary models have often dominated. Second, its release under the permissive MIT license removes significant barriers for both commercial and research applications, allowing developers to freely build upon and integrate the technology.

The MoE architecture also suggests an efficient design, capable of activating only the necessary expert sub-networks during inference. This can lead to faster performance and lower computational costs compared to dense models of a similar capability level, making advanced multimodal AI more accessible to a wider range of developers and organizations.

Sources

zai-org/GLM-4.5V
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026

Why it matters