Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, Xuanjing Huang


Abstract
Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
Anthology ID:
2021.acl-long.426
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5482–5492
Language:
URL:
https://aclanthology.org/2021.acl-long.426
DOI:
10.18653/v1/2021.acl-long.426
Bibkey:
Cite (ACL):
Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang. 2021. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5482–5492, Online. Association for Computational Linguistics.
Cite (Informal):
Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble (Zhou et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2021.acl-long.426.pdf
Video:
 https://preview.aclanthology.org/remove-xml-comments/2021.acl-long.426.mp4
Data
AG NewsIMDb Movie ReviewsSNLI