inclusionAI Debuts 'Any-to-Any' Multimodal MoE Model
The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.
AI research group inclusionAI has released Ming-flash-omni-Preview, a new open-source model designed for true multimodal flexibility. Released under a permissive MIT license, the model pursues an "any-to-any" capability, meaning it's built to process and generate a wide combination of data types, not just text and images.
This approach, often called "omnimodal," represents a significant step beyond models that are limited to specific input-output pairs, like text-to-image or audio-to-text. An any-to-any system can theoretically accept a mix of inputs—say, an image, a line of text, and an audio clip—and generate a relevant output in a requested modality.
The model is built on a Mixture of Experts (MoE) architecture, a technique that improves computational efficiency by routing inputs to specialized subnetworks, or "experts," rather than engaging the entire model for every token. According to the release card, Ming-flash-omni is based on a previous model called Ling-flash-2.0.
As major labs pursue closed, highly capable omnimodal models, the release of an open alternative like Ming-flash-omni-Preview provides researchers and developers with a valuable tool for experimentation. While labeled as a preview, it offers a foundational component for building applications that require a more fluid and comprehensive understanding of diverse data streams.
Sources
- Visit
inclusionAI/Ming-flash-omni-Preview
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.