REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

Tomer Ashuach; Martin Tutek; Yonatan Belinkov

doi:10.18653/v1/2025.findings-acl.763

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

Tomer Ashuach, Martin Tutek, Yonatan Belinkov

Abstract

Language models (LMs) risk inadvertently memorizing and divulging sensitive or personally identifiable information (PII) seen in training data, causing privacy concerns. Current approaches to address this issue involve costly dataset scrubbing, or model filtering through unlearning and model editing, which can be bypassed through extraction attacks. We propose REVS, a novel non-gradient-based method for unlearning sensitive information from LMs. REVS identifies and modifies a small subset of neurons relevant for constituent tokens which form sensitive information. To adequately evaluate our method on truly sensitive information, we curate three datasets: an email and URL datasets naturally memorized by the models, and a synthetic social security number dataset that we tune the models to memorize. Compared to other methods, REVS demonstrates superior performance in unlearning sensitive information and robustness to extraction attacks, while retaining underlying model integrity.

Anthology ID:: 2025.findings-acl.763
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14774–14797
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.763/
DOI:: 10.18653/v1/2025.findings-acl.763
Bibkey:
Cite (ACL):: Tomer Ashuach, Martin Tutek, and Yonatan Belinkov. 2025. REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14774–14797, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space (Ashuach et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.763.pdf

PDF Cite Search Fix data