inclusionAIAny-to-Any

inclusionAI's Ming 2.0 Tackles Any-to-Any Multimodality

The new open-source Mixture-of-Experts model can process and generate content across text, images, and audio in any combination.

Feb 10, 2026

NotableMIT

AI research group inclusionAI has released Ming-flash-omni 2.0, an ambitious open-source model designed to natively handle text, images, and audio. Released under a permissive MIT license, the model aims to provide a single, unified system for 'any-to-any' multimodal tasks.

Unlike many multimodal models that primarily link text and images, Ming 2.0 is built to process and generate content across all three modalities interchangeably. This could enable capabilities like generating an image from an audio clip, describing a picture with spoken words, or transcribing speech, all within one framework.

The model utilizes a Mixture-of-Experts (MoE) architecture, a design that can lead to more efficient computation by only activating relevant parts of the network for a given task. While specific details on its parameter count and training data are not yet public, the MoE approach suggests a focus on scalable performance.

This release represents another step forward for complex, open-source AI systems that can perceive and create in ways more analogous to human senses. Researchers and developers can explore the model's capabilities on its Hugging Face repository.

Sources

inclusionAI/Ming-flash-omni-2.0
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

inclusionAIAny-to-Any

inclusionAI's Ming 2.0 Tackles Any-to-Any Multimodality

The new open-source Mixture-of-Experts model can process and generate content across text, images, and audio in any combination.

Feb 10, 2026

NotableMIT

Sources

inclusionAI/Ming-flash-omni-2.0
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026