Abstract
Pre-trained language models acquire knowledge from vast amounts of text data, which can inadvertently contain sensitive information. To mitigate the presence of undesirable knowledge, the task of knowledge unlearning becomes crucial for language models. Previous research relies on gradient ascent methods to achieve knowledge unlearning, which is simple and effective. However, this approach calculates all the gradients of tokens in the sequence, potentially compromising the general ability of language models. To overcome this limitation, we propose an adaptive objective that calculates gradients with fine-grained control specifically targeting sensitive tokens. Our adaptive objective is pluggable, ensuring simplicity and enabling extension to the regularization-based framework that utilizes non-target data or other models to preserve general ability. Through extensive experiments targeting the removal of typical sensitive data, we demonstrate that our proposed method enhances the general ability of language models while achieving knowledge unlearning. Additionally, it demonstrates the capability to adapt to behavior alignment, eliminating all the undesirable knowledge within a specific domain.- Anthology ID:
- 2024.emnlp-main.566
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10141–10155
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.566/
- DOI:
- 10.18653/v1/2024.emnlp-main.566
- Cite (ACL):
- XiaoHua Feng, Chaochao Chen, Yuyuan Li, and Zibin Lin. 2024. Fine-grained Pluggable Gradient Ascent for Knowledge Unlearning in Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10141–10155, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Fine-grained Pluggable Gradient Ascent for Knowledge Unlearning in Language Models (Feng et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.566.pdf