REMIND: Memorization and Unlearning in LLMs Through the Lens of Input Loss Landscapes

Liran Cohen, Yaniv Nemcovsky, Avi Mendelson


Abstract
Understanding how large language models (LLMs) store, retain, and remove knowledge is critical for interpretability, reliability, and privacy compliance. We reveal a key phenomenon: machine unlearning imprints distinct geometric signatures in the model’s input loss landscape (ILL), with unlearned examples forming flat, low-curvature plateaus that contrast sharply with the high-curvature basins of retained or unseen examples. Remarkably, these patterns emerge even when pointwise losses overlap, exposing residual memorization through input-output behavior alone. Building on this insight, we introduce **REMIND (Residual Memorization in Neighborhood Dynamics)**, a framework that diagnoses memorization states (retained, forgotten, holdout) by probing local ILL curvature over semantically coherent neighborhoods. REMIND operates using only loss queries and a novel embedding-proximity perturbation method to generate controlled, interpretable variants. In evaluations, REMIND achieves 82% multi-class ROC-AUC, outperforming baselines like ROUGE-L and MIN-K%++, with roughly 2× higher AUC at 1% FPR, and remains robust on paraphrased inputs. This neighborhood-level geometric analysis provides a practical, interpretable lens on LLM knowledge retention and unlearning, detecting subtle residual signals missed by pointwise or aggregated metrics.
Anthology ID:
2026.acl-long.2215
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47955–47993
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2215/
DOI:
Bibkey:
Cite (ACL):
Liran Cohen, Yaniv Nemcovsky, and Avi Mendelson. 2026. REMIND: Memorization and Unlearning in LLMs Through the Lens of Input Loss Landscapes. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 47955–47993, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
REMIND: Memorization and Unlearning in LLMs Through the Lens of Input Loss Landscapes (Cohen et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2215.pdf
Checklist:
 2026.acl-long.2215.checklist.pdf