PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack

Rui Zheng; Rong Bao; Qin Liu; Tao Gui; Qi Zhang; Xuan-Jing Huang; Rui Xie; Wei Wu

PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack

Rui Zheng, Rong Bao, Qin Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Rui Xie, Wei Wu

Abstract

Adversarial training, which minimizes the loss of adversarially perturbed examples, has received considerable attention. However, these methods require modifying all model parameters and optimizing the model from scratch, which is parameter inefficient and unfriendly to the already deployed models. As an alternative, we propose a pluggable defense module PlugAT, to provide robust predictions by adding a few trainable parameters to the model inputs while keeping the original model frozen. To reduce the potential side effects of using defense modules, we further propose a novel forgetting restricted adversarial training, which filters out bad adversarial examples that impair the performance of original ones. The PlugAT-equipped BERT model substantially improves robustness over several strong baselines on various text classification tasks, whilst training only 9.1% parameters. We observe that defense modules trained under the same model architecture have domain adaptation ability between similar text classification datasets.

Anthology ID:: 2022.coling-1.253
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 2873–2882
Language:
URL:: https://aclanthology.org/2022.coling-1.253
DOI:
Bibkey:
Cite (ACL):: Rui Zheng, Rong Bao, Qin Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Rui Xie, and Wei Wu. 2022. PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2873–2882, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack (Zheng et al., COLING 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.253.pdf
Data: IMDb Movie Reviews, SST

PDF Search