Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning
Shiji Yang, Shu Zhao, Congyao Mei, Zhen Yang, Jie Chen, Fulan Qian, Zhen Duan, Yanping Zhang
Abstract
Prompt tuning for Large Language Models (LLMs) is vulnerable to backdoor attacks. Existing methods find backdoor attacks to be a significant threat in data-rich scenarios. However, in data-limited scenarios, these methods have difficulty capturing precise backdoor patterns, leading to weakened backdoor attack capabilities and significant side effects for the LLMs, which limits their practical relevance. To explore this problem, we propose a backdoor attacks through contrastive-enhanced machine unlearning in data-limited scenarios, called BCU. Specifically, BCU introduces a multi-objective machine unlearning method to capture precise backdoor patterns by forgetting the association between non-trigger data and the backdoor patterns, reducing side effects. Moreover, we design a contrastive learning strategy to enhance the association between triggers and backdoor patterns, improving the capability of backdoor attacks. Experimental results on 6 NLP datasets and 4 LLMs show that BCU exhibits strong backdoor attack capabilities and slight side effects, whether the training data is rich or limited. Our findings highlight practical security risks of backdoor attacks against LLMs, necessitating further research for security purposes. Our code is available at https://github.com/AHU-YangSJ/BCU.- Anthology ID:
- 2025.findings-emnlp.1338
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24597–24607
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1338/
- DOI:
- 10.18653/v1/2025.findings-emnlp.1338
- Cite (ACL):
- Shiji Yang, Shu Zhao, Congyao Mei, Zhen Yang, Jie Chen, Fulan Qian, Zhen Duan, and Yanping Zhang. 2025. Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24597–24607, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Forget the Unneeded: Backdooring Large Language Models via Contrastive-enhanced Machine Unlearning (Yang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1338.pdf