More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

Shahar Levy, Nir Mazor, Lihi Shalmon, Michael Hassid, Gabriel Stanovsky


Abstract
Retrieval-Augmented Generation (RAG) enhances the accuracy of Large Language Model (LLM) responses by leveraging relevant external documents during generation. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various language models on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for most LLMs, reducing performance by up to 20%. However, Qwen2 maintained consistent results across increasing document counts, indicating better multi-document handling capability. Finally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We will publicly release the datasets and code upon publication to facilitate further research in multi-document retrieval.
Anthology ID:
2025.findings-emnlp.1064
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19539–19547
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1064/
DOI:
10.18653/v1/2025.findings-emnlp.1064
Bibkey:
Cite (ACL):
Shahar Levy, Nir Mazor, Lihi Shalmon, Michael Hassid, and Gabriel Stanovsky. 2025. More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19539–19547, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG (Levy et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1064.pdf
Checklist:
 2025.findings-emnlp.1064.checklist.pdf