Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

Zhanli Li; Yixuan Cao; Lvzhou Luo; Ping Luo

Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

Zhanli Li, Yixuan Cao, Lvzhou Luo, Ping Luo

Abstract

This paper introduces the task of analytical question answering over large, semi-structured document collections. We present MuDABench, a benchmark for multi-document analytical QA, where questions require extracting and synthesizing information across numerous documents to perform quantitative analysis. Unlike existing multi-document QA benchmarks that typically require information from only a few documents with limited cross-document reasoning, MuDABench demands extensive inter-document analysis and aggregation. Constructed via distant supervision by leveraging document-level metadata and annotated financial databases, MuDABench comprises over 80,000 pages and 332 analytical QA instances. We also propose an evaluation protocol that measures final answer accuracy and uses intermediate-fact coverage as an auxiliary diagnostic signal for the reasoning process. Experiments reveal that standard RAG systems, which treat all documents as a flat retrieval pool, perform poorly. To address these limitations, we propose a multi-agent workflow that orchestrates planning, extraction, and code generation modules. While this approach substantially improves both process and outcome metrics, a significant gap remains compared to human expert performance. Our analysis identifies two primary bottlenecks: single-document information extraction accuracy and insufficient domain-specific knowledge in current systems. MuDABench is available at https://github.com/Zhanli-Li/MuDABench.

Anthology ID:: 2026.findings-acl.341
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6877–6898
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.341/
DOI:
Bibkey:
Cite (ACL):: Zhanli Li, Yixuan Cao, Lvzhou Luo, and Ping Luo. 2026. Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6877–6898, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.341.pdf
Checklist:: 2026.findings-acl.341.checklist.pdf

PDF Cite Search Checklist Fix data