Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, Cho-Jui Hsieh


Abstract
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.
Anthology ID:
2022.findings-acl.284
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3588–3598
Language:
URL:
https://aclanthology.org/2022.findings-acl.284
DOI:
10.18653/v1/2022.findings-acl.284
Bibkey:
Cite (ACL):
Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh. 2022. Improving the Adversarial Robustness of NLP Models by Information Bottleneck. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3588–3598, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Improving the Adversarial Robustness of NLP Models by Information Bottleneck (Zhang et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.findings-acl.284.pdf
Software:
 2022.findings-acl.284.software.zip
Code
 zhangcen456/ib
Data
AG NewsIMDb Movie ReviewsSSTSST-2