The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestDeepSeek1.0
DeepSeekVision-Language

DeepSeek-OCR Tackles Document Parsing with Vision AI

The new vision-language model uses a novel context compression technique to efficiently extract text and structure from complex documents.

Oct 17, 2025
Major releaseMIT
DeepSeek · Vision-Language
DeepSeek-OCR
DeepSeek-OCR

AI company DeepSeek has released DeepSeek-OCR, a new open-source model aimed at improving how machines read and understand documents. Licensed under the permissive MIT license, the model combines computer vision with language processing to go beyond simple text extraction, interpreting the layout and structure of complex pages.

The key innovation behind DeepSeek-OCR is a technique the company calls "optical context compression." Instead of processing a full, high-resolution document image with a large vision encoder, the model first compresses the visual information into a compact, specialized format. This compressed "optical context" is then fed to a language model, making the analysis of multi-page documents significantly more efficient.

This two-stage process allows the model to handle sophisticated document-related tasks. After the compression stage, users can interact with the document's content through a language model interface, enabling operations like:

  • Targeted information extraction
  • Document-grounded question answering
  • Summarization of tables and text

By open-sourcing the model, DeepSeek is providing a powerful tool for developers building applications for data entry automation, archival digitization, and accessibility. The approach represents a move away from traditional OCR systems, which often falter on complex layouts, toward a more holistic understanding of documents. The model and its technical details are available on its Hugging Face repository.

Sources

  • deepseek-ai/DeepSeek-OCR

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseMIT
Downloads1.7M

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026