Google DeepMindAny-to-Any

Google Releases 2B Multimodal Gemma 4 Assistant Model

The new compact model from DeepMind is instruction-tuned for "any-to-any" tasks, capable of processing and generating mixed data types.

Apr 23, 2026

UpdateApache 2.0

Google DeepMind has released a new addition to its open-source Gemma family: a 2-billion-parameter model designed for multimodal assistant tasks. Dubbed "Gemma 4 E2B-it Assistant," the model is notably compact, aiming to bring sophisticated capabilities to a wider range of hardware.

This release is an instruction-tuned variant, meaning it's been specifically fine-tuned to follow user commands and engage in conversational interactions. Its key feature is its "any-to-any" architecture, which allows it to process and generate a mix of data types beyond just text—a significant capability for a model of its size.

Compact Multimodality

The model's combination of a small parameter count and advanced multimodal features makes it particularly interesting. While larger models have long handled mixed inputs, a capable 2B model opens up new possibilities for developers building applications for edge devices, specialized agents, or scenarios where computational resources are constrained.

The Gemma 4 E2B-it Assistant is licensed under the permissive Apache 2.0 license, encouraging both research and commercial use. Developers can explore the model and its capabilities now, as it is available on Hugging Face.

Sources

google/gemma-4-E2B-it-assistant
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Compact Multimodality