Qwen · AlibabaVision-Language

Alibaba's Qwen team releases 4B vision-language model

The new Qwen3.5-4B model combines text and image understanding in a compact, permissively licensed package for developers.

Feb 27, 2026

NotableApache 2.0

The Qwen team at Alibaba has released Qwen3.5-4B, a new 4-billion-parameter model that marks the debut of their Qwen3.5 family. Released under the permissive Apache 2.0 license, the model is designed for instruction-following tasks that involve both text and images.

As a vision-language model (VLM), Qwen3.5-4B can process and understand visual information alongside natural language prompts. This enables applications like describing the content of a photo, answering questions about an image, or following complex instructions that reference visual elements.

The release is notable for its combination of compact size and open access. A 4B parameter count makes the model more accessible for developers and researchers who may not have access to large-scale GPU clusters. Paired with the commercially-friendly license, Qwen3.5-4B represents another strong contender in the growing field of smaller, capable open-source multimodal models.

This release is the first in the Qwen3.5 series, setting the stage for potential future models in the family. Developers can access the model card, weights, and usage examples directly from the official Hugging Face repository.

Sources

Qwen/Qwen3.5-4B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026