Google Releases Gemma 4, a 26B Vision-Language Model
The new open-source model from DeepMind uses a Mixture-of-Experts architecture to handle both text and image inputs efficiently.
Google DeepMind has expanded its open-source offerings with the release of Gemma 4 26B Instruct, a new vision-language model. Published under a permissive Apache 2.0 license, this model is designed to understand and process both text and images, making it a versatile tool for multimodal applications.
An Efficient Multimodal Architecture
The key innovation in Gemma 4 is its Mixture-of-Experts (MoE) architecture. While the model contains a total of 26 billion parameters, it's designed for efficiency by activating only a fraction of them for any given task. The model's designation, "A4B," suggests that approximately 4 billion parameters are active at a time, offering potent performance without the full computational cost of a dense 26B model.
As an instruction-tuned model, Gemma 4 26B is optimized to follow user prompts and commands, making it suitable for a wide range of chat and assistant-style applications. Researchers and developers can access the model and its technical details on its Hugging Face repository.
This release signals Google's continued investment in the open-source AI ecosystem, providing a powerful, state-of-the-art multimodal model to the community. The efficient MoE design makes advanced vision-language capabilities more accessible, enabling new possibilities for applications that can see and reason about the world.
Sources
- Visit
google/gemma-4-26B-A4B-it
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.