Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

Shota Takashiro, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo


Abstract
As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information is becoming increasingly essential. For instance, LLMs are expected to selectively provide confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public and unauthorized entities.Therefore, we propose a novel method termed ìn-context knowledge unlearning”, which enables the model to selectively forget information in test-time based on the query context.Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving unrelated information. Experiments on TOFU, AGE and RWKU datasets using Llama2-7B/13B and Mistral-7B models demonstrate that our method achieves up to 95% forget accuracy while retaining 80% of unrelated knowledge, significantly outperforming baselines in both in-domain and out-of-domain scenarios.Further investigation of the model’s internal behavior revealed that while fine-tuned LLMs generate correct predictions in the middle layers and preserve them up to the final layer. However, the decision to forget is made only at the last layer, i.e. LLMs pretend to forget”.Our findings offer valuable insight into the improvement of the robustness of the unlearning mechanisms in LLMs, laying a foundation for future research in the field.
Anthology ID:
2025.findings-acl.1276
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24872–24885
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1276/
DOI:
Bibkey:
Cite (ACL):
Shota Takashiro, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, and Yutaka Matsuo. 2025. Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 24872–24885, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning (Takashiro et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1276.pdf