Natural Context Drift Undermines the Natural Language Understanding of Large Language Models

Yulong Wu, Viktor Schlegel, Riza Batista-Navarro


Abstract
How does the natural evolution of context paragraphs affect Question Answering (QA) in generative Large Language Models (LLMs)? To address this, we propose a framework for curating naturally evolved, human-edited variants of reading passages from contemporary QA benchmarks and for analysing LLM performance across a range of semantic similarity scores, which quantify how closely each variant aligns with Wikipedia content on the same article topic that the LLM saw during pretraining. Using this framework, we evaluate 6 QA datasets and 8 LLMs with publicly available training data. Our experiments reveal that LLM performance declines as reading passages naturally diverge from the versions encountered during pretraining–even when the question and all necessary information remains present at inference time. For instance, average accuracy on BoolQ drops by over 30% from the highest to lowest similarity bins. This finding suggests that natural text evolution may pose a significant challenge to the language understanding capabilities of fully open-source LLMs.
Anthology ID:
2025.findings-emnlp.65
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1248–1259
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.65/
DOI:
10.18653/v1/2025.findings-emnlp.65
Bibkey:
Cite (ACL):
Yulong Wu, Viktor Schlegel, and Riza Batista-Navarro. 2025. Natural Context Drift Undermines the Natural Language Understanding of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 1248–1259, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Natural Context Drift Undermines the Natural Language Understanding of Large Language Models (Wu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.65.pdf
Checklist:
 2025.findings-emnlp.65.checklist.pdf