Alibaba's Qwen team releases 4B vision-language model
The new Qwen3.5-4B model combines text and image understanding in a compact, permissively licensed package for developers.
The Qwen team at Alibaba has released Qwen3.5-4B, a new 4-billion-parameter model that marks the debut of their Qwen3.5 family. Released under the permissive Apache 2.0 license, the model is designed for instruction-following tasks that involve both text and images.
As a vision-language model (VLM), Qwen3.5-4B can process and understand visual information alongside natural language prompts. This enables applications like describing the content of a photo, answering questions about an image, or following complex instructions that reference visual elements.
The release is notable for its combination of compact size and open access. A 4B parameter count makes the model more accessible for developers and researchers who may not have access to large-scale GPU clusters. Paired with the commercially-friendly license, Qwen3.5-4B represents another strong contender in the growing field of smaller, capable open-source multimodal models.
This release is the first in the Qwen3.5 series, setting the stage for potential future models in the family. Developers can access the model card, weights, and usage examples directly from the official Hugging Face repository.
Sources
- Visit
Qwen/Qwen3.5-4B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Vision-Language
Moonshot AI Releases Kimi, a Multimodal Coding Model
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
Google Releases Open-Source DiffusionGemma 26B Model
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.