Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Juraj Vladika; Mahdi Dhaini; Florian Matthes

doi:10.18653/v1/2025.findings-emnlp.487

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Juraj Vladika, Mahdi Dhaini, Florian Matthes

Abstract

The growing capabilities of Large Language Models (LLMs) can enhance healthcare by assisting medical researchers, physicians, and improving access to health services for patients. LLMs encode extensive knowledge within their parameters, including medical knowledge derived from many sources. However, the knowledge in LLMs can become outdated over time, posing challenges in keeping up with evolving medical recommendations and research. This can lead to LLMs providing outdated health advice or failures in medical reasoning tasks. To address this gap, our study introduces two novel biomedical question-answering (QA) datasets derived from medical systematic literature reviews: MedRevQA, a general dataset of 16,501 biomedical QA pairs, and MedChangeQA, a subset of 512 QA pairs whose verdict changed though time. By evaluating the performance of eight popular LLMs, we find that all models exhibit memorization of outdated knowledge to some extent. We provide deeper insights and analysis, paving the way for future research on this challenging aspect of LLMs.

Anthology ID:: 2025.findings-emnlp.487
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9161–9174
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.487/
DOI:: 10.18653/v1/2025.findings-emnlp.487
Bibkey:
Cite (ACL):: Juraj Vladika, Mahdi Dhaini, and Florian Matthes. 2025. Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9161–9174, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models (Vladika et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.487.pdf
Checklist:: 2025.findings-emnlp.487.checklist.pdf

PDF Cite Search Checklist Fix data