Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park


Abstract
Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyze whether the generated documents contain information entailed by ground-truth evidence and assess their impact on performance. Our findings indicate that, on average, performance improvements consistently occurred for claims whose generated documents included sentences entailed by gold evidence. This suggests that knowledge leakage may be present in fact-verification benchmarks, potentially inflating the perceived performance of LLM-based query expansion methods.
Anthology ID:
2025.findings-acl.980
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19170–19187
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.980/
DOI:
Bibkey:
Cite (ACL):
Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, and Kunwoo Park. 2025. Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19170–19187, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion (Yoon et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.980.pdf