The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestOpenBMB4.6
OpenBMBVision-Language

OpenBMB Releases MiniCPM-V for On-Device Vision

The new open-source vision-language model is designed for high-resolution image understanding on mobile and edge devices.

Apr 13, 2026
NotableApache 2.0
OpenBMB · Vision-Language
MiniCPM-V-4.6
MiniCPM-V-4.6

AI research group OpenBMB has released MiniCPM-V-4.6, a lightweight, open-source vision-language model (VLM) explicitly designed for efficient performance on consumer hardware like mobile phones. The model aims to bring powerful multimodal understanding, previously limited to cloud-based services, directly to edge devices.

At its core, MiniCPM-V-4.6 combines the Llama-3-8B-Instruct language model with a SigLIP-400M vision encoder. According to the release details available on its Hugging Face repository, the model was trained on a 10 billion token dataset of high-quality image-text pairs. A key feature is its ability to process images at a high resolution of up to 1848x1848 pixels, which the developers claim gives it exceptional optical character recognition (OCR) capabilities.

Performance and Features

OpenBMB reports that MiniCPM-V-4.6 demonstrates strong general-purpose visual understanding and instruction-following ability. Key highlights include:

  • High-Resolution Support: Enables detailed analysis and superior OCR.
  • On-Device Focus: Engineered for efficient inference on mobile and terminal devices.
  • Open Access: Released under the permissive Apache 2.0 license.

The developers claim the model surpasses several proprietary models, including GPT-4V, in certain open-ended evaluations, highlighting its strength in real-world visual reasoning tasks.

By targeting on-device deployment, MiniCPM-V-4.6 represents a significant step toward making advanced AI more accessible, private, and responsive. Running models locally reduces reliance on network connectivity and lowers latency, opening up new possibilities for real-time multimodal applications on personal devices.

Sources

  • openbmb/MiniCPM-V-4.6

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseAPACHE-2.0
Downloads566.1K

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026