Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate

Xiaomeng Jin; Zhiqi Bu; Bhanukiran Vinzamuri; Anil Ramakrishna; Kai-Wei Chang; Volkan Cevher; Mingyi Hong

Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate

Xiaomeng Jin, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong

Abstract

Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.

Anthology ID:: 2025.naacl-long.563
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11278–11294
Language:
URL:: https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-long.563/
DOI:
Bibkey:
Cite (ACL):: Xiaomeng Jin, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, and Mingyi Hong. 2025. Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11278–11294, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate (Jin et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-long.563.pdf

PDF Cite Search Fix data