Google Releases Multimodal Gemma 4 31B Model
The new 31-billion-parameter model is instruction-tuned and can process both text and images, marking a significant expansion for the Gemma family.
Google DeepMind has expanded its open-weights lineup with the release of Gemma 4 31B IT, a powerful new multimodal model. This 31-billion-parameter release marks a significant milestone for the Gemma family, introducing vision-language capabilities at a substantial scale.
As a Vision Language Model (VLM), Gemma 4 31B can interpret and process information from both text and images simultaneously. This allows it to handle tasks that were previously out of reach for its text-only predecessors, such as describing the contents of a photograph or answering questions based on visual information.
The "IT" in the model's name signifies that it is instruction-tuned, meaning it has been specifically optimized to follow user prompts and perform conversational tasks. This fine-tuning makes it more reliable and useful for building interactive applications.
The introduction of a capable VLM under the Gemma name advances Google's open-source AI strategy, providing developers with a strong foundation for building sophisticated multimodal applications. The model is available under the Gemma license and can be accessed by researchers and developers on its Hugging Face repository.
Sources
- Visit
google/gemma-4-31B-it
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.