Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning

Shiji Yang; Shu Zhao; Congyao Mei; Zhen Yang; Jie Chen; Fulan Qian; Zhen Duan; Yanping Zhang

doi:10.18653/v1/2025.findings-emnlp.1338

Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning

Shiji Yang, Shu Zhao, Congyao Mei, Zhen Yang, Jie Chen, Fulan Qian, Zhen Duan, Yanping Zhang

Abstract

Prompt tuning for Large Language Models (LLMs) is vulnerable to backdoor attacks. Existing methods find backdoor attacks to be a significant threat in data-rich scenarios. However, in data-limited scenarios, these methods have difficulty capturing precise backdoor patterns, leading to weakened backdoor attack capabilities and significant side effects for the LLMs, which limits their practical relevance. To explore this problem, we propose a backdoor attacks through contrastive-enhanced machine unlearning in data-limited scenarios, called BCU. Specifically, BCU introduces a multi-objective machine unlearning method to capture precise backdoor patterns by forgetting the association between non-trigger data and the backdoor patterns, reducing side effects. Moreover, we design a contrastive learning strategy to enhance the association between triggers and backdoor patterns, improving the capability of backdoor attacks. Experimental results on 6 NLP datasets and 4 LLMs show that BCU exhibits strong backdoor attack capabilities and slight side effects, whether the training data is rich or limited. Our findings highlight practical security risks of backdoor attacks against LLMs, necessitating further research for security purposes. Our code is available at https://github.com/AHU-YangSJ/BCU.

Anthology ID:: 2025.findings-emnlp.1338
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24597–24607
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1338/
DOI:: 10.18653/v1/2025.findings-emnlp.1338
Bibkey:
Cite (ACL):: Shiji Yang, Shu Zhao, Congyao Mei, Zhen Yang, Jie Chen, Fulan Qian, Zhen Duan, and Yanping Zhang. 2025. Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24597–24607, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning (Yang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1338.pdf
Checklist:: 2025.findings-emnlp.1338.checklist.pdf

PDF Cite Search Checklist Fix data