inclusionAIAny-to-Any

inclusionAI Debuts 'Any-to-Any' Multimodal MoE Model

The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.

Oct 14, 2025

NotableMIT

AI research group inclusionAI has released Ming-flash-omni-Preview, a new open-source model designed for true multimodal flexibility. Released under a permissive MIT license, the model pursues an "any-to-any" capability, meaning it's built to process and generate a wide combination of data types, not just text and images.

This approach, often called "omnimodal," represents a significant step beyond models that are limited to specific input-output pairs, like text-to-image or audio-to-text. An any-to-any system can theoretically accept a mix of inputs—say, an image, a line of text, and an audio clip—and generate a relevant output in a requested modality.

The model is built on a Mixture of Experts (MoE) architecture, a technique that improves computational efficiency by routing inputs to specialized subnetworks, or "experts," rather than engaging the entire model for every token. According to the release card, Ming-flash-omni is based on a previous model called Ling-flash-2.0.

As major labs pursue closed, highly capable omnimodal models, the release of an open alternative like Ming-flash-omni-Preview provides researchers and developers with a valuable tool for experimentation. While labeled as a preview, it offers a foundational component for building applications that require a more fluid and comprehensive understanding of diverse data streams.

Sources

inclusionAI/Ming-flash-omni-Preview
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

inclusionAIAny-to-Any

inclusionAI Debuts 'Any-to-Any' Multimodal MoE Model

The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.

Oct 14, 2025

NotableMIT

Sources

inclusionAI/Ming-flash-omni-Preview
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026