Qwen · AlibabaText / LLM

Qwen Releases 80B Mixture-of-Experts Model

The new Qwen3-Next model from Alibaba combines a large parameter count with an efficient MoE architecture to balance performance and computational cost.

Sep 9, 2025

Major releaseApache 2.0

The Qwen team at Alibaba has released Qwen3-Next-80B-A3B-Instruct, a new large language model that employs a Mixture-of-Experts (MoE) architecture. This release marks the introduction of the Qwen3-Next series, signaling a focus on more computationally efficient designs for powerful models.

The key feature of this model is its MoE structure. While it contains a total of 80 billion parameters, only 3 billion are activated for processing any given token. This design aims to provide the knowledge and nuance of a very large model while keeping inference costs significantly lower, making it more accessible for a wider range of applications and hardware setups.

Technical Specifications

Beyond its architecture, Qwen3-Next is an instruction-tuned model designed for chat and task completion. It supports a context length of up to 65,536 tokens, making it suitable for tasks requiring long-form context and analysis. The model is built on a standard Transformer foundation with SwiGLU activations and Group Query Attention for efficiency.

Released under the permissive Apache 2.0 license, the Qwen3-Next-80B-A3B-Instruct model is available for both research and commercial use. This continues the trend of major AI labs contributing powerful, open models that allow developers to build without restrictive licensing, fostering broader innovation in the ecosystem.

Sources

Qwen/Qwen3-Next-80B-A3B-Instruct
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Meituan Ships a Lighter, Sparser LongCat-Flash

The food-delivery giant's newest open model trims its mixture-of-experts design for more efficient inference under an MIT license.

Jul 31, 2026

DeepSeek/Text / LLM

DeepSeek Refreshes V4-Flash With New 0731 Checkpoint

The MIT-licensed mixture-of-experts model returns in an updated build shipping with FP8 weights for cheaper inference.

Jul 31, 2026

DeepSeek/Text / LLM

DeepSeek Ships V4-Flash, a 304B MoE Tuned for Agents

The latest checkpoint in DeepSeek's V4 line leans into agentic workflows while keeping the permissive MIT license.

Jul 31, 2026

Technical Specifications