Google Releases Gemma 4 E4B, a 4B Multimodal Model
The new 4-billion-parameter vision-language model brings image and text understanding to Google's popular open-source family.
Google DeepMind has expanded its open-source Gemma family with the release of Gemma 4 E4B, a new 4-billion-parameter multimodal model. This marks a significant step for the Gemma series, introducing vision capabilities to the previously text-focused lineup.
Unlike its predecessors, Gemma 4 E4B is a vision-language model (VLM) built to process and reason about images and text simultaneously. Following an "image-text-to-text" architecture, it can analyze visual information alongside textual prompts to generate relevant text-based responses. This allows it to handle tasks like visual question answering and image-based content generation.
The model's designation suggests a focus on efficiency at the 4-billion-parameter scale. By providing a relatively compact VLM, Google is targeting developers who need to build multimodal applications without relying on the extensive computational resources required by larger, proprietary models. This makes it a compelling option for use cases on consumer hardware or in other resource-constrained environments.
Gemma 4 E4B is available now on Hugging Face under the Gemma license, which allows for commercial use and distribution. Its release provides the open-source community with another powerful and accessible tool for building the next generation of AI applications.
Sources
- Visit
google/gemma-4-E4B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.