Attention-Enhancing Backdoor Attacks Against BERT-based Models

Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen


Abstract
Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model’s vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).
Anthology ID:
2023.findings-emnlp.716
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10672–10690
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.716
DOI:
10.18653/v1/2023.findings-emnlp.716
Bibkey:
Cite (ACL):
Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, and Chao Chen. 2023. Attention-Enhancing Backdoor Attacks Against BERT-based Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10672–10690, Singapore. Association for Computational Linguistics.
Cite (Informal):
Attention-Enhancing Backdoor Attacks Against BERT-based Models (Lyu et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.716.pdf