Google DeepMindAny-to-Any

Google Releases Gemma 4 Multimodal Open Model

The new 26-billion-parameter model from DeepMind uses a mixture-of-experts design for greater efficiency and is tuned for assistant-style tasks.

Apr 23, 2026

NotableApache 2.0

Google DeepMind has expanded its open-source offerings with the release of Gemma 4, a new generation of its popular model family. The initial variant, Gemma 4 26B-A4B Instruct, is a powerful multimodal model designed for conversational AI and assistant tasks.

Efficient by Design

Gemma 4 employs a sparse Mixture-of-Experts (MoE) architecture, a design choice that balances model scale with computational cost. While the model contains a total of 26 billion parameters, only 4 billion are active during inference for any given input. This approach allows Gemma 4 to achieve the performance associated with larger models while requiring significantly less processing power.

Key specifications include:

Total Parameters: 26 billion
Active Parameters: 4 billion
Modalities: Text and Vision (VLM)
License: Apache 2.0

This release is notable for its multimodal capabilities. As a vision-language model (VLM), Gemma 4 can understand and process both text and image inputs, making it suitable for a wide range of applications from image captioning to visual Q&A. This specific version is instruction-tuned, meaning it has been optimized to follow user prompts and engage in helpful dialogue.

By releasing Gemma 4 under the permissive Apache 2.0 license, Google continues to support the open-source AI community. This allows developers and researchers to freely build upon, modify, and deploy the model for both academic and commercial purposes, further accelerating innovation in the field.

Sources

google/gemma-4-26B-A4B-it-assistant
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Efficient by Design

Key specifications include:

Total Parameters: 26 billion

Active Parameters: 4 billion

Modalities: Text and Vision (VLM)

License: Apache 2.0