A General Framework to Enhance Fine-tuning-based LLM Unlearning

Jie Ren; Zhenwei Dai; Xianfeng Tang; Hui Liu; Jingying Zeng; Zhen Li; Rahul Goutam; Suhang Wang; Yue Xing; Qi He; Hui Liu

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Jie Ren, Zhenwei Dai, Xianfeng Tang, Hui Liu, Jingying Zeng, Zhen Li, Rahul Goutam, Suhang Wang, Yue Xing, Qi He, Hui Liu

Abstract

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations—essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.

Anthology ID:: 2025.findings-acl.949
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18464–18476
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.949/
DOI:
Bibkey:
Cite (ACL):: Jie Ren, Zhenwei Dai, Xianfeng Tang, Hui Liu, Jingying Zeng, Zhen Li, Rahul Goutam, Suhang Wang, Yue Xing, Qi He, and Hui Liu. 2025. A General Framework to Enhance Fine-tuning-based LLM Unlearning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 18464–18476, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: A General Framework to Enhance Fine-tuning-based LLM Unlearning (Ren et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.949.pdf

PDF Cite Search Fix data