The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestBaidu1.5
BaiduVision-Language

Baidu Releases Open VLM for Advanced Document OCR

The new PaddleOCR-VL model is built to parse not just text, but also the tables, formulas, and page layouts found in complex documents.

Jan 28, 2026
NotableApache 2.0
Baidu · Vision-Language
PaddleOCR-VL-1.5
PaddleOCR-VL-1.5

Baidu has released PaddleOCR-VL-1.5, a new open-source vision-language model specialized in advanced document analysis. Released under a permissive Apache 2.0 license, the model aims to move beyond simple text extraction to understand the full structure and content of complex documents.

Unlike traditional OCR tools that focus solely on converting characters, PaddleOCR-VL is designed to handle a wider range of document intelligence tasks. According to its release notes, its key capabilities include:

  • Full-page layout parsing
  • Table structure recognition and extraction
  • Mathematical formula detection
  • Chart and graph analysis

This makes the model particularly suited for applications in academic research, finance, and enterprise document management, where information is often presented in structured, non-prose formats.

The model is based on Baidu's ERNIE 4.5 foundation, extending its multimodal capabilities specifically for the document OCR domain. By open-sourcing this specialized tool, Baidu provides developers with a powerful component for building applications that can digitize and interpret intricate information from images and scans. The model is available now on Hugging Face for community use and development.

Sources

  • PaddlePaddle/PaddleOCR-VL-1.5

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseAPACHE-2.0
Downloads32.2K

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026