The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestBaidu1.0
BaiduVision-Language

Baidu Releases PaddleOCR-VL for Document AI

The new vision-language model is fine-tuned to understand not just text, but the complex structure of tables, charts, and formulas.

Oct 16, 2025
NotableOther
Baidu · Vision-Language
PaddleOCR-VL
PaddleOCR-VL

Baidu has released PaddleOCR-VL, a new open-source vision-language model specialized for complex document understanding. The model aims to go beyond simple text recognition by interpreting the structural elements within a page, a common challenge in automated data processing.

Built on the company's ERNIE 4.5 architecture, PaddleOCR-VL is designed to handle challenging optical character recognition (OCR) tasks that often trip up traditional systems. Its capabilities extend to parsing the intricate details of documents, including page layouts, tables, mathematical formulas, and charts.

This VLM-based approach allows the model to leverage contextual understanding, treating a document as a cohesive whole rather than a simple sequence of characters. By understanding relationships between text and visual elements, it can more accurately extract structured data from unstructured sources like scanned reports or academic papers.

The release of PaddleOCR-VL provides developers with a powerful new tool for document intelligence and automation pipelines. It reflects a growing trend of applying large multimodal models to solve specific, high-value problems in data extraction and analysis. The model is available on Hugging Face under an Apache 2.0 license.

Sources

  • PaddlePaddle/PaddleOCR-VL

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseOTHER
Downloads4.9K

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026