Baidu Releases Qianfan-OCR for Document Intelligence
The new vision-language model from the Chinese tech giant is designed for complex, multilingual optical character recognition and layout analysis.
Chinese technology company Baidu has released Qianfan-OCR, a new vision-language model specialized for optical character recognition and document understanding. The model is aimed at developers who need to extract text and structural information from complex documents across multiple languages.
As a document intelligence model, Qianfan-OCR is designed to go beyond simple text transcription. Its capabilities include recognizing tables, analyzing page layouts, and handling a wide variety of languages. This makes it suitable for digitizing complex materials like invoices, structured forms, and academic papers that mix text with other elements.
A New Tool for Digitization
The release adds a powerful new option to the growing ecosystem of open models for document processing. Baidu's entry provides a strong multilingual solution for enterprise and archival applications where documents often contain complex formatting. This is a critical task for businesses looking to automate data entry and researchers digitizing large volumes of text.
The model weights and usage instructions are available on the Hugging Face Hub. Potential users should note that it is released under a custom End User License Agreement, which may place restrictions on certain use cases compared to more permissive open-source licenses.
Sources
- Visit
baidu/Qianfan-OCR
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Vision-Language
Moonshot AI Releases Kimi, a Multimodal Coding Model
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
Google Releases Open-Source DiffusionGemma 26B Model
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.