LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models

Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu


Abstract
The problem of data contamination is now almost inevitable during the development of large language models (LLMs), with the training data commonly integrating those evaluation benchmarks even unintentionally. This problem subsequently makes it hard to benchmark LLMs fairly. Instead of constructing contamination-free datasets (quite hard), we propose a novel framework, LNE-Blocking, to restore model performance prior to contamination on potentially leaked datasets. Our framework consists of two components: contamination detection and disruption operation. For the prompt, the framework first uses the contamination detection method, LNE, to assess the extent of contamination in the model. Based on this, it adjusts the intensity of the disruption operation, Blocking, to elicit non-memorized responses from the model. Our framework is the first to efficiently restore the model’s greedy decoding performance. This comes with a strong performance on multiple datasets with potential leakage risks, and it consistently achieves stable recovery results across different models and varying levels of data contamination. We release the code at https://github.com/RuijieH/LNE-Blocking to facilitate research.
Anthology ID:
2025.findings-emnlp.188
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3512–3528
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.188/
DOI:
10.18653/v1/2025.findings-emnlp.188
Bibkey:
Cite (ACL):
Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, and Hongyuan Lu. 2025. LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3512–3528, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models (Hou et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.188.pdf
Checklist:
 2025.findings-emnlp.188.checklist.pdf