Google DeepMindAny-to-Any

Google Releases Gemma 4 E4B, a 4B Multimodal Model

The new 4-billion-parameter vision-language model brings image and text understanding to Google's popular open-source family.

Mar 2, 2026

NotableGemma

Google DeepMind has expanded its open-source Gemma family with the release of Gemma 4 E4B, a new 4-billion-parameter multimodal model. This marks a significant step for the Gemma series, introducing vision capabilities to the previously text-focused lineup.

Unlike its predecessors, Gemma 4 E4B is a vision-language model (VLM) built to process and reason about images and text simultaneously. Following an "image-text-to-text" architecture, it can analyze visual information alongside textual prompts to generate relevant text-based responses. This allows it to handle tasks like visual question answering and image-based content generation.

The model's designation suggests a focus on efficiency at the 4-billion-parameter scale. By providing a relatively compact VLM, Google is targeting developers who need to build multimodal applications without relying on the extensive computational resources required by larger, proprietary models. This makes it a compelling option for use cases on consumer hardware or in other resource-constrained environments.

Gemma 4 E4B is available now on Hugging Face under the Gemma license, which allows for commercial use and distribution. Its release provides the open-source community with another powerful and accessible tool for building the next generation of AI applications.

Sources

google/gemma-4-E4B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Google DeepMindAny-to-Any

Google Releases Gemma 4 E4B, a 4B Multimodal Model

The new 4-billion-parameter vision-language model brings image and text understanding to Google's popular open-source family.

Mar 2, 2026

NotableGemma

Sources

google/gemma-4-E4B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026