Rethinking Stealthiness of Backdoor Attack against NLP Models

Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun


Abstract
Recent researches have shown that large natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack. Backdoor attacked models can achieve good performance on clean test sets but perform badly on those input sentences injected with designed trigger words. In this work, we point out a potential problem of current backdoor attacking research: its evaluation ignores the stealthiness of backdoor attacks, and most of existing backdoor attacking methods are not stealthy either to system deployers or to system users. To address this issue, we first propose two additional stealthiness-based metrics to make the backdoor attacking evaluation more credible. We further propose a novel word-based backdoor attacking method based on negative data augmentation and modifying word embeddings, making an important step towards achieving stealthy backdoor attacking. Experiments on sentiment analysis and toxic detection tasks show that our method is much stealthier while maintaining pretty good attacking performance. Our code is available at https://github.com/lancopku/SOS.
Anthology ID:
2021.acl-long.431
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5543–5557
Language:
URL:
https://aclanthology.org/2021.acl-long.431
DOI:
10.18653/v1/2021.acl-long.431
Bibkey:
Cite (ACL):
Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, and Xu Sun. 2021. Rethinking Stealthiness of Backdoor Attack against NLP Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5543–5557, Online. Association for Computational Linguistics.
Cite (Informal):
Rethinking Stealthiness of Backdoor Attack against NLP Models (Yang et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2021.acl-long.431.pdf
Video:
 https://preview.aclanthology.org/remove-xml-comments/2021.acl-long.431.mp4
Code
 lancopku/sos
Data
IMDb Movie Reviews