The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

Latestrednote-hilabdots.ocr
rednote-hilabVision-Language

New VLM `dots.ocr` Takes on Complex Documents

The new 3B-parameter model from rednote-hilab uses a vision-language approach to parse tables, layouts, and even mathematical formulas.

Jul 30, 2025
NotableOther
rednote-hilab · Vision-Language
dots.ocr
dots.ocr

Researchers at rednote-hilab have released dots.ocr, a new open-source model designed for sophisticated document understanding. At 3 billion parameters, this vision-language model (VLM) moves beyond simple text extraction to interpret the complex structure of a page.

Built upon Microsoft's powerful Florence-2 vision foundation model, dots.ocr applies a multi-modal approach to Optical Character Recognition (OCR). Instead of merely identifying characters in sequence, it comprehends the spatial relationships between elements, allowing it to make sense of a document's overall layout.

Advanced Document Parsing

The model's capabilities make it particularly well-suited for digitizing challenging content. It excels at:

  • Layout Analysis: Identifying columns, headers, and figures.
  • Table Extraction: Accurately parsing rows and columns from structured tables.
  • Formula Recognition: Transcribing complex mathematical and scientific notation, a common failure point for traditional OCR systems.

The release of dots.ocr provides a strong, openly-licensed alternative for developers building document intelligence applications. By handling nuanced formats that often require manual intervention, it opens new possibilities for automating data extraction from scientific papers, financial reports, and technical manuals. The model and usage examples are available on its Hugging Face repository.

Sources

  • rednote-hilab/dots.ocr

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters3B
Context window—
LicenseOTHER
Downloads170.2K

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026