The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

Category · other

Latest Any-to-Any models

The newest open-source Any-to-Any releases, from across the ecosystem.

Filter

46 releases

MiniMax/Vision-LanguageMajor release

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026
CodeAny-to-Any
MiniMax-M3
MiniMax-M3
Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026
Text / LLMAny-to-Any
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026
Text / LLMAny-to-Any
Gemma 4 12B
Gemma 4 12B
ByteDance/Any-to-AnyMajor release

ByteDance Releases Lance, a Unified Generative AI Model

The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.

May 15, 2026
Image EditingAny-to-Any
Lance
Lance
SenseTime/Any-to-Any

SenseTime Releases 8B 'Any-to-Any' Infographic Model

The new 8B-parameter SenseNova U1 model from SenseTime is designed for complex multimodal tasks, including the in-conversation generation and editing of infographics.

May 14, 2026
Image EditingAny-to-Any
SenseNova U1 8B MoT Infographic
SenseNova U1 8B MoT Infographic
NVIDIA/Any-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026
Any-to-AnyReasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Google DeepMind/Any-to-Any

Google Releases Gemma 4 Multimodal Open Model

The new 26-billion-parameter model from DeepMind uses a mixture-of-experts design for greater efficiency and is tuned for assistant-style tasks.

Apr 23, 2026
Text / LLMAny-to-Any
Gemma 4 26B-A4B Instruct (MoE)
Gemma 4 26B-A4B Instruct (MoE)
Google DeepMind/Any-to-AnyMajor release

Google Releases Multimodal Gemma 4 31B Model

The new 31-billion-parameter model is an instruction-tuned, 'any-to-any' powerhouse released under a permissive Apache 2.0 license.

Apr 23, 2026
Text / LLMAny-to-Any
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases 4B Multimodal Gemma 4 Assistant

The new 4-billion-parameter model is instruction-tuned for 'any-to-any' tasks, handling a flexible mix of data types.

Apr 23, 2026
Text / LLMAny-to-Any
Gemma 4 E4B-it Assistant
Gemma 4 E4B-it Assistant
Google DeepMind/Any-to-Any

Google Releases 2B Multimodal Gemma 4 Assistant Model

The new compact model from DeepMind is instruction-tuned for "any-to-any" tasks, capable of processing and generating mixed data types.

Apr 23, 2026
Text / LLMAny-to-Any
Gemma 4 E2B-it Assistant
Gemma 4 E2B-it Assistant
inclusionAI/Any-to-Any

LLaDA2.0-Uni: A Unified MoE for Vision Tasks

The new open-source model from inclusionAI uses a Mixture-of-Experts architecture to handle multiple vision tasks in a single, diffusion-based system.

Apr 22, 2026
Image EditingAny-to-Any
LLaDA2.0-Uni
LLaDA2.0-Uni
SenseTime/Any-to-Any

SenseTime Releases 8B Any-to-Any Multimodal Model

The new SenseNova-U1 model unifies image understanding, generation, and editing within a single 8-billion-parameter framework.

Apr 22, 2026
Image EditingAny-to-Any
SenseNova-U1-8B-MoT
SenseNova-U1-8B-MoT
NVIDIA/Any-to-Any

NVIDIA Releases Nemotron-3-Nano Omni-Modal MoE

The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.

Apr 20, 2026
Any-to-AnyReasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
KRAFTON/Any-to-Any

KRAFTON Releases 9B Bilingual Speech Model

The gaming giant behind 'PUBG' has released Raon-Speech-9B, a multimodal model for English and Korean speech recognition and synthesis.

Mar 30, 2026
Speech → TextAny-to-Any
Raon-Speech-9B
Raon-Speech-9B
HKUSTAudio/Any-to-Any

HKUST Releases Audio-Omni, a Unified Audio Model

The new diffusion-based model handles speech, music, and general audio tasks like conversion and editing within a single, versatile framework.

Mar 27, 2026
Any-to-AnyMusic
Audio-Omni
Audio-Omni
Meituan/Any-to-Any

Meituan Releases LongCat-Next 'Any-to-Any' AI Model

The Chinese tech company has released the weights for a unified model that can process and generate combinations of text, images, audio, and video.

Mar 25, 2026
Text / LLMAny-to-Any
LongCat-Next
LongCat-Next
GAIR/Image → Video

GAIR Releases daVinci-MagiHuman for Video Generation

The new open-source model from the General Artificial Intelligence Research team can create video clips complete with audio from a variety of inputs.

Mar 21, 2026
Image → VideoAny-to-Any
daVinci-MagiHuman
daVinci-MagiHuman
Google DeepMind/Any-to-AnyMajor release

Google Releases Compact Gemma 4 E2B Multimodal Model

The new 2-billion-parameter model from Google DeepMind brings efficient image-and-text understanding to the open-source Gemma family.

Mar 2, 2026
Text / LLMAny-to-Any
Gemma 4 E2B
Gemma 4 E2B
Google DeepMind/Any-to-AnyMajor release

Google's Gemma 4 Arrives with Any-to-Any Multimodal Skills

The new 2-billion-parameter model from DeepMind can process text, vision, and audio, making it a versatile and efficient foundation for developers.

Mar 2, 2026
Text / LLMAny-to-Any
Gemma 4 E2B
Gemma 4 E2B
Google DeepMind/Any-to-Any

Google Releases Gemma 4 E4B, a 4B Multimodal Model

The new 4-billion-parameter vision-language model brings image and text understanding to Google's popular open-source family.

Mar 2, 2026
Text / LLMAny-to-Any
Gemma 4 E4B
Gemma 4 E4B
Google DeepMind/Any-to-AnyMajor release

Google's Gemma 4 Debuts with Any-to-Any Multimodality

The new 4-billion parameter model from Google DeepMind is designed for versatile input and output, handling text, images, and other data types.

Mar 2, 2026
Text / LLMAny-to-Any
Gemma 4 E4B
Gemma 4 E4B
inclusionAI/Any-to-Any

inclusionAI's Ming 2.0 Tackles Any-to-Any Multimodality

The new open-source Mixture-of-Experts model can process and generate content across text, images, and audio in any combination.

Feb 10, 2026
Any-to-Any
Ming-flash-omni 2.0
Ming-flash-omni 2.0
OpenBMB/Any-to-Any

OpenBMB Releases 'Any-to-Any' Multimodal Model

The new MiniCPM-o 4.5 model from the open-source research group can process and generate interleaved combinations of images, text, and audio.

Feb 3, 2026
Any-to-AnyVision-Language
MiniCPM-o 4.5
MiniCPM-o 4.5
OpenBMB/Any-to-Any

MiniCPM-o 4.5 Offers 'Any-to-Any' Multimodal AI

The new model from OpenBMB supports mixed-modality inputs and outputs, from text and images to audio and video, in a single efficient package.

Feb 2, 2026
Any-to-AnyVision-Language
MiniCPM-o 4.5
MiniCPM-o 4.5
OpenMOSS/Any-to-Any

OpenMOSS Releases MOVA, a 720p Multimodal Video Generator

The new open model can generate high-definition video with synchronized audio from a flexible combination of text and image prompts.

Jan 28, 2026
Image → VideoAny-to-Any
MOVA 720p
MOVA 720p
Qwen · Alibaba/Any-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025
Speech → TextAny-to-Any
Fun-Audio-Chat-8B
Fun-Audio-Chat-8B
FlashLabs/Any-to-Any

FlashLabs Releases Chroma-4B, an Any-to-Any Model

The new 4-billion-parameter model handles text, image, and speech inputs and outputs, including direct speech-to-speech translation.

Nov 28, 2025
Any-to-Any
Chroma-4B
Chroma-4B
Allen Institute for AI/Any-to-Any

BAAI Releases Emu3.5, an 'Any-to-Any' Multimodal Model

The new open-source model from the Allen Institute for AI unifies text and image understanding and generation into a single architecture.

Oct 31, 2025
Any-to-AnyText → Image
Emu3.5
Emu3.5
Meituan/Any-to-Any

Meituan Debuts LongCat-Flash-Omni, an Any-to-Any AI Model

The new open-source Mixture-of-Experts model can process and generate any combination of text, images, video, audio, and 3D data.

Oct 23, 2025
Any-to-Any
LongCat-Flash-Omni
LongCat-Flash-Omni
Kuaishou/Any-to-Any

Kling Releases UniVideo for Generation and Understanding

The new open-source model combines both video generation and comprehension, a rare dual capability built on the Qwen2.5 vision-language foundation.

Oct 18, 2025
Any-to-AnyText → Video
UniVideo
UniVideo
inclusionAI/Any-to-Any

inclusionAI Debuts 'Any-to-Any' Multimodal MoE Model

The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.

Oct 14, 2025
Any-to-Any
Ming-flash-omni-Preview
Ming-flash-omni-Preview
inclusionAI/Any-to-Any

inclusionAI Releases Ming-UniVision MoE Multimodal Model

The new 16-billion-parameter model uses a sparse Mixture-of-Experts design to efficiently handle 'any-to-any' data combinations, from text to images.

Sep 30, 2025
Any-to-AnyVision-Language
Ming-UniVision-16B-A3B
Ming-UniVision-16B-A3B
Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 30B MoE Vision Model, Qwen3-VL

The new open-source model from Alibaba uses a Mixture-of-Experts architecture to make its powerful vision-language capabilities more efficient to run.

Sep 30, 2025
Any-to-AnyVision-Language
Qwen3-VL-8B-Instruct
Qwen3-VL-8B-Instruct
inclusionAI/Any-to-Any

Ming-UniAudio Brings MoE to Unified Audio AI

A new 16-billion-parameter model from inclusionAI uses a Mixture-of-Experts architecture to handle a wide range of audio tasks efficiently.

Sep 29, 2025
Speech → TextAny-to-Any
Ming-UniAudio-16B-A3B
Ming-UniAudio-16B-A3B
Qwen · Alibaba/Any-to-AnyMajor release

Qwen3-Omni Arrives With Any-to-Any Multimodality

The new 30B Mixture-of-Experts model from Alibaba's Qwen team can process and generate content across text, image, and audio formats.

Sep 20, 2025
Speech → TextAny-to-Any
Qwen3-Omni-30B-A3B-Instruct
Qwen3-Omni-30B-A3B-Instruct
Xiaomi/Any-to-Any

Xiaomi's MiMo-Audio 7B Tackles Complex Speech Tasks

This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.

Sep 18, 2025
Speech → TextAny-to-Any
MiMo-Audio-7B-Instruct
MiMo-Audio-7B-Instruct
Qwen · Alibaba/Any-to-Any

Qwen Releases 'Thinking' Multimodal MoE Model

The new 30-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team is designed to show its reasoning process for complex multimodal tasks.

Sep 15, 2025
Any-to-AnyReasoning
Qwen3-Omni-30B-A3B-Thinking
Qwen3-Omni-30B-A3B-Thinking
Qwen · Alibaba/Any-to-Any

Qwen Releases 30B Model for Audio Captioning

The new Mixture-of-Experts model from Alibaba is fine-tuned to generate detailed, multilingual descriptions for complex audio content.

Sep 15, 2025
Any-to-AnyText → Speech
Qwen3-Omni-30B-A3B-Captioner
Qwen3-Omni-30B-A3B-Captioner
Alpha-VLLM/Any-to-Any

Lumina-DiMOO: A Diffusion Model for Any-to-Any AI

This new open-source model uses a diffusion architecture instead of a typical transformer to generate and understand a mix of media types.

Sep 9, 2025
Any-to-AnyText → Image
Lumina-DiMOO
Lumina-DiMOO
StepFun/Any-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025
Speech → TextAny-to-Any
Step-Audio 2 mini
Step-Audio 2 mini
NexaAI/Any-to-Any

NexaAI Releases OmniNeural-4B for On-Device AI

The new 4-billion-parameter model is designed for 'any-to-any' multimodal tasks and optimized to run efficiently on mobile hardware.

Aug 15, 2025
Any-to-Any
OmniNeural-4B
OmniNeural-4B
Skywork/Any-to-Any

Skywork Releases UniPic, a Unified 1.5B Vision Model

The new autoregressive model from the Chinese AI lab can understand, generate, and edit images within a single, compact framework.

Jul 29, 2025
Image EditingAny-to-Any
Skywork-UniPic-1.5B
Skywork-UniPic-1.5B
inclusionAI/Any-to-Any

Ming-Lite-Omni 1.5 Brings Any-to-Any Modality to Open Source

The new MIT-licensed model from inclusionAI can process and generate a mix of text, images, audio, and video, pushing the boundaries of open multimodal AI.

Jul 15, 2025
Any-to-Any
Ming-Lite-Omni 1.5
Ming-Lite-Omni 1.5
ByteDance/Any-to-Any

ByteDance Releases Tar-7B for 'Any-to-Any' Multimodality

The new 7-billion-parameter model from the company's SEED team can process and generate a mix of text, images, audio, and video in a single unified framework.

Jul 2, 2025
Any-to-Any
Tar-7B
Tar-7B
AIDC-AI/Any-to-Any

Ovis-U1-3B Unifies Image Understanding and Generation

The new 3-billion-parameter model from AIDC-AI combines vision-language understanding and image generation into a single 'any-to-any' framework.

Jun 28, 2025
Any-to-AnyText → Image
Ovis-U1-3B
Ovis-U1-3B
FreedomIntelligence/Any-to-Any

Janus-4o-7B Adds Image Generation to 7B Multimodal AI

The new 7-billion-parameter model from FreedomIntelligence can process various inputs and generate or edit images based on text prompts.

Jun 23, 2025
Image EditingAny-to-Any
Janus-4o-7B
Janus-4o-7B