BaiduVision-Language

Baidu Releases Qianfan-OCR for Document Intelligence

The new vision-language model from the Chinese tech giant is designed for complex, multilingual optical character recognition and layout analysis.

Mar 18, 2026

NotableOther

Chinese technology company Baidu has released Qianfan-OCR, a new vision-language model specialized for optical character recognition and document understanding. The model is aimed at developers who need to extract text and structural information from complex documents across multiple languages.

As a document intelligence model, Qianfan-OCR is designed to go beyond simple text transcription. Its capabilities include recognizing tables, analyzing page layouts, and handling a wide variety of languages. This makes it suitable for digitizing complex materials like invoices, structured forms, and academic papers that mix text with other elements.

A New Tool for Digitization

The release adds a powerful new option to the growing ecosystem of open models for document processing. Baidu's entry provides a strong multilingual solution for enterprise and archival applications where documents often contain complex formatting. This is a critical task for businesses looking to automate data entry and researchers digitizing large volumes of text.

The model weights and usage instructions are available on the Hugging Face Hub. Potential users should note that it is released under a custom End User License Agreement, which may place restrictions on certain use cases compared to more permissive open-source licenses.

Sources

baidu/Qianfan-OCR
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026

A New Tool for Digitization