DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems

Anni Zou; Wenhao Yu; Hongming Zhang; Kaixin Ma; Deng Cai; Zhuosheng Zhang; Hai Zhao; Dong Yu (于东)

doi:10.18653/v1/2025.knowledgenlp-1.29

DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems

Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu

Abstract

Recent advancements in proprietary large language models (LLMs), such as those from OpenAI and Anthropic, have led to the development of document reading systems capable of handling raw files with complex layouts, intricate formatting, lengthy content, and multi-modal information. However, the absence of a standardized benchmark hinders objective evaluation of these systems. To address this gap, we introduce DocBench, a benchmark designed to simulate real-world scenarios, where each raw file consists of a document paired with one or more questions. DocBench uniquely evaluates entire document reading systems and adopts a user-centric approach, allowing users to identify the system best suited to their needs.

Anthology ID:: 2025.knowledgenlp-1.29
Volume:: Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Editors:: Weijia Shi, Wenhao Yu, Akari Asai, Meng Jiang, Greg Durrett, Hannaneh Hajishirzi, Luke Zettlemoyer
Venues:: KnowledgeNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 359–373
Language:
URL:: https://preview.aclanthology.org/moar-dois/2025.knowledgenlp-1.29/
DOI:: 10.18653/v1/2025.knowledgenlp-1.29
Bibkey:
Cite (ACL):: Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, and Dong Yu. 2025. DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems. In Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing, pages 359–373, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems (Zou et al., KnowledgeNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/moar-dois/2025.knowledgenlp-1.29.pdf

PDF Cite Search Fix data