Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao, Leilei Gan, Anh Tuan Luu, Jie Fu, Lingjuan Lyu, Meihuizi Jia, Jinming Wen
Abstract
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to language models have been proposed and successfully implemented. However, this raises the question of whether PEFT, which only updates a limited set of model parameters, constitutes security vulnerabilities when confronted with weight-poisoning backdoor attacks. In this study, we show that PEFT is more susceptible to weight-poisoning backdoor attacks compared to the full-parameter fine-tuning method, with pre-defined triggers remaining exploitable and pre-defined targets maintaining high confidence, even after fine-tuning. Motivated by this insight, we developed a Poisoned Sample Identification Module (PSIM) leveraging PEFT, which identifies poisoned samples through confidence, providing robust defense against weight-poisoning backdoor attacks. Specifically, we leverage PEFT to train the PSIM with randomly reset sample labels. During the inference process, extreme confidence serves as an indicator for poisoned samples, while others are clean. We conduct experiments on text classification tasks, five fine-tuning strategies, and three weight-poisoning backdoor attack methods. Experiments show near 100% success rates for weight-poisoning backdoor attacks when utilizing PEFT. Furthermore, our defensive approach exhibits overall competitive performance in mitigating weight-poisoning backdoor attacks.- Anthology ID:
- 2024.findings-naacl.217
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3421–3438
- Language:
- URL:
- https://aclanthology.org/2024.findings-naacl.217
- DOI:
- 10.18653/v1/2024.findings-naacl.217
- Cite (ACL):
- Shuai Zhao, Leilei Gan, Anh Tuan Luu, Jie Fu, Lingjuan Lyu, Meihuizi Jia, and Jinming Wen. 2024. Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3421–3438, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning (Zhao et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-naacl.217.pdf