Qwen Releases 30B MoE Vision Model, Qwen3-VL
The new open-source model from Alibaba uses a Mixture-of-Experts architecture to make its powerful vision-language capabilities more efficient to run.
Alibaba's Qwen team has released Qwen3-VL, a new open-source vision-language model (VLM) that combines high performance with computational efficiency. This instruction-tuned model is designed to understand and process both text and images, making it suitable for a wide range of multimodal tasks.
The model's key innovation is its Mixture-of-Experts (MoE) architecture. While it contains a total of 30 billion parameters, only 3 billion are activated during inference for any given input. This design allows it to achieve the performance associated with a much larger model while maintaining the speed and lower resource requirements of a smaller one, a significant advantage for developers and researchers.
As an instruction-tuned model, Qwen3-VL is optimized for conversational and task-oriented applications. It can follow complex commands that involve analyzing visual content, such as answering detailed questions about an image or generating descriptive captions. This makes it a powerful tool for building more sophisticated AI assistants and applications.
The model is released under the permissive Apache 2.0 license, encouraging broad adoption for both academic and commercial projects. Full details and model weights are available on its Hugging Face repository.
Sources
- Visit
Qwen/Qwen3-VL-30B-A3B-Instruct
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Vision-Language
Moonshot AI Releases Kimi, a Multimodal Coding Model
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
Google Releases Open-Source DiffusionGemma 26B Model
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.