MiniMaxVision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026

Major releaseOther

MiniMax AI has entered the open-weight model space with the release of MiniMax-M3, a new multimodal foundation model. Available now on Hugging Face, the model stands out for its sophisticated Mixture-of-Experts (MoE) architecture, a design choice often used to scale models efficiently without a proportional increase in computational cost.

M3 is designed as a versatile system, capable of handling a diverse range of inputs and tasks. Its core strengths lie in its multimodal understanding, allowing it to process and interpret both text and images. Beyond vision, the model is also engineered for coding and complex reasoning, positioning it as a powerful tool for developing more capable AI agents.

The use of an MoE architecture is the key technical detail here. This design activates only relevant neural network pathways—or 'experts'—for a given task. This allows the model to contain a vast number of parameters while keeping inference costs manageable, making powerful capabilities more accessible to a wider range of developers and researchers.

While the model's weights are publicly available, it's released under a custom license rather than a permissive open-source license. This 'open-weight' approach requires users to agree to specific terms for use, reflecting a growing trend where companies release powerful models with some restrictions.

Sources

MiniMaxAI/MiniMax-M3
Hugging Face
Visit
MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities
Hacker News
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026