Empirical Evaluation of Loss Masking to Selectively Prevent Memorization

Tagore Rao Kosireddy, Evan Lucas


Abstract
Large language models are known to memorize training data under certain training conditions. It can be desirable to selectively prevent personal information from being memorized; and one such method of selectively preventing memorization that has been proposed is loss masking. To the best of the authors knowledge, at the time of writing, although this method has been alluded to, there has not been a thorough empirical evaluation of the utility of this method. We describe the method of loss masking and demonstrate its performance through a set of experiments on a small autoregressive language model. We base one experiment on previous work finding memorized personal information in language models and another experiment on searching for backdoor watermarking trigger words and phrases. Overall, we find that loss masking is highly effective at selectively preventing memorization of sensitive information.
Anthology ID:
2025.l2m2-1.11
Volume:
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Robin Jia, Eric Wallace, Yangsibo Huang, Tiago Pimentel, Pratyush Maini, Verna Dankers, Johnny Wei, Pietro Lesci
Venues:
L2M2 | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–149
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.l2m2-1.11/
DOI:
10.18653/v1/2025.l2m2-1.11
Bibkey:
Cite (ACL):
Tagore Rao Kosireddy and Evan Lucas. 2025. Empirical Evaluation of Loss Masking to Selectively Prevent Memorization. In Proceedings of the First Workshop on Large Language Model Memorization (L2M2), pages 142–149, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Empirical Evaluation of Loss Masking to Selectively Prevent Memorization (Kosireddy & Lucas, L2M2 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.l2m2-1.11.pdf