Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

Haiyang Yu; Yuchuan Wu; Fan Shi; Jinghui Lu; Ke Niu; Xiaodong Ge; Minghan Zhuo; Jingqun Tang; Bin Li

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

Haiyang Yu, Yuchuan Wu, Fan Shi, Jinghui Lu, Ke Niu, Xiaodong Ge, Minghan Zhuo, Jingqun Tang, Bin Li

Abstract

Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding—traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual/linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we present AncientDoc, the first benchmark for Chinese ancient documents, designed to assess VLMs from OCR to knowledge reasoning. AncientDoc includes five tasks (page-level OCR, vernacular translation, reasoning-based QA, knowledge-based QA, linguistic variant QA) and covers 14 document types, over 100 books, and about 3,000 pages. Based on AncientDoc, we evaluate mainstream VLMs using multiple metrics, supplemented by a human-aligned large language model for scoring.

Anthology ID:: 2026.findings-acl.1438
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28793–28812
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1438/
DOI:
Bibkey:
Cite (ACL):: Haiyang Yu, Yuchuan Wu, Fan Shi, Jinghui Lu, Ke Niu, Xiaodong Ge, Minghan Zhuo, Jingqun Tang, and Bin Li. 2026. Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28793–28812, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning (Yu et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1438.pdf
Checklist:: 2026.findings-acl.1438.checklist.pdf

PDF Cite Search Checklist Fix data