Qwen · AlibabaAny-to-Any

Qwen Releases 'Thinking' Multimodal MoE Model

The new 30-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team is designed to show its reasoning process for complex multimodal tasks.

Sep 15, 2025

NotableOther

Alibaba's Qwen team has introduced a new model designed for transparent reasoning, called Qwen3-Omni-30B-A3B-Thinking. This release is a specialized variant of the broader Qwen3-Omni family, focusing on tasks that require complex, multi-step logic across various data types.

The model's key feature is its ability to output its "chain of thought," a step-by-step trace of its reasoning process. This transparency is a significant advantage for developers, allowing them to better understand, debug, and guide the model's decision-making. Instead of just providing a final answer, Qwen3-Omni shows its work, demystifying the path it took to reach a conclusion.

Efficient Architecture, Broad Capabilities

Under the hood, Qwen3-Omni is a Mixture-of-Experts (MoE) model. While it contains a total of 30 billion parameters, it only activates an average of 3 billion for any given task. This architecture aims to provide the knowledge scale of a large model with the inference efficiency closer to that of a much smaller one.

As an "omni-modal" model, its capabilities extend beyond text to a wide range of inputs:

Image and video understanding
Audio processing
Document analysis

This versatility makes it suitable for complex applications that need to synthesize information from multiple sources. The model and its weights are available for developers to explore on its Hugging Face repository. It's released under a custom license agreement, so users should review the terms before deployment.

Sources

Qwen/Qwen3-Omni-30B-A3B-Thinking
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Efficient Architecture, Broad Capabilities

As an "omni-modal" model, its capabilities extend beyond text to a wide range of inputs:

Image and video understanding

Audio processing

Document analysis