Bhanukiran Vinzamuri
2025
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Xiaomeng Jin
|
Zhiqi Bu
|
Bhanukiran Vinzamuri
|
Anil Ramakrishna
|
Kai-Wei Chang
|
Volkan Cevher
|
Mingyi Hong
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.
2023
Adversarial Robustness for Large Language NER models using Disentanglement and Word Attributions
Xiaomeng Jin
|
Bhanukiran Vinzamuri
|
Sriram Venkatapathy
|
Heng Ji
|
Pradeep Natarajan
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLM’s) have been widely used for several applications such as question answering, text classification and clustering. While the preliminary results across the aforementioned tasks looks promising, recent work has dived deep into LLM’s performing poorly for complex Named Entity Recognition (NER) tasks in comparison to fine-tuned pre-trained language models (PLM’s). To enhance wider adoption of LLM’s, our paper investigates the robustness of such LLM NER models and its instruction fine-tuned variants to adversarial attacks. In particular, we propose a novel attack which relies on disentanglement and word attribution techniques where the former aids in learning an embedding capturing both entity and non-entity influences separately, and the latter aids in identifying important words across both components. This is in stark contrast to most techniques which primarily leverage non-entity words for perturbations limiting the space being explored to synthesize effective adversarial examples. Adversarial training results based on our method improves the F1 score over original LLM NER model by 8% and 18% on CoNLL-2003 and Ontonotes 5.0 datasets respectively.
Search
Fix data
Co-authors
- Xiaomeng Jin 2
- Zhiqi Bu 1
- Volkan Cevher 1
- Kai-Wei Chang 1
- Mingyi Hong 1
- show all...