2025
pdf
bib
abs
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Xiaomeng Jin
|
Zhiqi Bu
|
Bhanukiran Vinzamuri
|
Anil Ramakrishna
|
Kai-Wei Chang
|
Volkan Cevher
|
Mingyi Hong
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.
pdf
bib
abs
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
Anil Ramakrishna
|
Yixin Wan
|
Xiaomeng Jin
|
Kai - Wei Chang
|
Zhiqi Bu
|
Bhanukiran Vinzamuri
|
Volkan Volkan Cevher
|
Mingyi Hong
|
Rahul Gupta
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
We introduce SemEval-2025 Task 4: unlearn- ing sensitive content from Large Language Models (LLMs). The task features 3 subtasks for LLM unlearning spanning different use cases: (1) unlearn long form synthetic creative documents spanning different genres; (2) un- learn short form synthetic biographies contain- ing personally identifiable information (PII), in- cluding fake names, phone number, SSN, email and home addresses, and (3) unlearn real docu- ments sampled from the target model’s training dataset. We received over 100 submissions from over 30 institutions and we summarize the key techniques and lessons in this paper.
2023
pdf
bib
abs
Adversarial Robustness for Large Language NER models using Disentanglement and Word Attributions
Xiaomeng Jin
|
Bhanukiran Vinzamuri
|
Sriram Venkatapathy
|
Heng Ji
|
Pradeep Natarajan
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLM’s) have been widely used for several applications such as question answering, text classification and clustering. While the preliminary results across the aforementioned tasks looks promising, recent work has dived deep into LLM’s performing poorly for complex Named Entity Recognition (NER) tasks in comparison to fine-tuned pre-trained language models (PLM’s). To enhance wider adoption of LLM’s, our paper investigates the robustness of such LLM NER models and its instruction fine-tuned variants to adversarial attacks. In particular, we propose a novel attack which relies on disentanglement and word attribution techniques where the former aids in learning an embedding capturing both entity and non-entity influences separately, and the latter aids in identifying important words across both components. This is in stark contrast to most techniques which primarily leverage non-entity words for perturbations limiting the space being explored to synthesize effective adversarial examples. Adversarial training results based on our method improves the F1 score over original LLM NER model by 8% and 18% on CoNLL-2003 and Ontonotes 5.0 datasets respectively.