DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu
Abstract
Recent advancements in proprietary large language models (LLMs), such as those from OpenAI and Anthropic, have led to the development of document reading systems capable of handling raw files with complex layouts, intricate formatting, lengthy content, and multi-modal information. However, the absence of a standardized benchmark hinders objective evaluation of these systems. To address this gap, we introduce DocBench, a benchmark designed to simulate real-world scenarios, where each raw file consists of a document paired with one or more questions. DocBench uniquely evaluates entire document reading systems and adopts a user-centric approach, allowing users to identify the system best suited to their needs.- Anthology ID:
- 2025.knowledgenlp-1.29
- Volume:
- Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing
- Month:
- May
- Year:
- 2025
- Address:
- Albuquerque, New Mexico, USA
- Editors:
- Weijia Shi, Wenhao Yu, Akari Asai, Meng Jiang, Greg Durrett, Hannaneh Hajishirzi, Luke Zettlemoyer
- Venues:
- KnowledgeNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 359–373
- Language:
- URL:
- https://preview.aclanthology.org/moar-dois/2025.knowledgenlp-1.29/
- DOI:
- 10.18653/v1/2025.knowledgenlp-1.29
- Cite (ACL):
- Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, and Dong Yu. 2025. DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems. In Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing, pages 359–373, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems (Zou et al., KnowledgeNLP 2025)
- PDF:
- https://preview.aclanthology.org/moar-dois/2025.knowledgenlp-1.29.pdf