MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

MiniMax AI has entered the open-weight model space with the release of MiniMax-M3, a new multimodal foundation model. Available now on Hugging Face, the model stands out for its sophisticated Mixture-of-Experts (MoE) architecture, a design choice often used to scale models efficiently without a proportional increase in computational cost.
M3 is designed as a versatile system, capable of handling a diverse range of inputs and tasks. Its core strengths lie in its multimodal understanding, allowing it to process and interpret both text and images. Beyond vision, the model is also engineered for coding and complex reasoning, positioning it as a powerful tool for developing more capable AI agents.
The use of an MoE architecture is the key technical detail here. This design activates only relevant neural network pathways—or 'experts'—for a given task. This allows the model to contain a vast number of parameters while keeping inference costs manageable, making powerful capabilities more accessible to a wider range of developers and researchers.
While the model's weights are publicly available, it's released under a custom license rather than a permissive open-source license. This 'open-weight' approach requires users to agree to specific terms for use, reflecting a growing trend where companies release powerful models with some restrictions.
Sources
- Visit
MiniMaxAI/MiniMax-M3
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Vision-Language
Moonshot AI Releases Kimi, a Multimodal Coding Model
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
Google Releases Open-Source DiffusionGemma 26B Model
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.