Datalab ToVision-Language

Datalab Releases Chandra, a New OCR Vision Model

The new vision-language model from Datalab is fine-tuned from Qwen2-VL to specialize in extracting text and structure from complex documents.

Oct 21, 2025

NotableOpenRAIL-M

Datalab has introduced Chandra, a new open-source model designed to tackle the complex challenge of optical character recognition (OCR) and document understanding. As a vision-language model (VLM), Chandra goes beyond simple text extraction, aiming to interpret the layout and structure of documents like forms, receipts, and invoices.

The model is a specialized fine-tune of Alibaba's powerful Qwen2-VL-7B, giving it a strong foundation in both visual perception and language comprehension. Datalab has released Chandra under an OpenRAIL license, which permits a wide range of uses while including certain restrictions to encourage responsible deployment of the technology.

While traditional OCR tools are effective at converting clean, printed text into digital formats, they often falter with varied layouts, tables, or handwritten content. By leveraging a VLM architecture, Chandra can analyze a document holistically, understanding the relationship between different visual elements and the text they contain. This capability is key for automating data entry and digitizing complex archives more accurately.

Key Applications

Extracting structured data from invoices and receipts.
Parsing complex tables and forms.
Digitizing handwritten notes and annotations.
Analyzing documents with multi-column layouts.

Chandra represents a focused application of large vision models to a persistent business problem. For developers working on document processing pipelines, it offers a powerful new tool for improving accuracy and automation. The model and further documentation are available on the Hugging Face Hub.

Sources

datalab-to/chandra
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026

Key Applications

Extracting structured data from invoices and receipts.

Parsing complex tables and forms.

Digitizing handwritten notes and annotations.

Analyzing documents with multi-column layouts.