PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack
Rui Zheng, Rong Bao, Qin Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Rui Xie, Wei Wu
Abstract
Adversarial training, which minimizes the loss of adversarially perturbed examples, has received considerable attention. However, these methods require modifying all model parameters and optimizing the model from scratch, which is parameter inefficient and unfriendly to the already deployed models. As an alternative, we propose a pluggable defense module PlugAT, to provide robust predictions by adding a few trainable parameters to the model inputs while keeping the original model frozen. To reduce the potential side effects of using defense modules, we further propose a novel forgetting restricted adversarial training, which filters out bad adversarial examples that impair the performance of original ones. The PlugAT-equipped BERT model substantially improves robustness over several strong baselines on various text classification tasks, whilst training only 9.1% parameters. We observe that defense modules trained under the same model architecture have domain adaptation ability between similar text classification datasets.- Anthology ID:
- 2022.coling-1.253
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2873–2882
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.253
- DOI:
- Cite (ACL):
- Rui Zheng, Rong Bao, Qin Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Rui Xie, and Wei Wu. 2022. PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2873–2882, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack (Zheng et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.253.pdf
- Data
- IMDb Movie Reviews, SST