Improving the Adversarial Robustness of NLP Models by Information Bottleneck
Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, Cho-Jui Hsieh
Abstract
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.- Anthology ID:
- 2022.findings-acl.284
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3588–3598
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.284
- DOI:
- 10.18653/v1/2022.findings-acl.284
- Cite (ACL):
- Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh. 2022. Improving the Adversarial Robustness of NLP Models by Information Bottleneck. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3588–3598, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Improving the Adversarial Robustness of NLP Models by Information Bottleneck (Zhang et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.findings-acl.284.pdf
- Code
- zhangcen456/ib
- Data
- AG News, IMDb Movie Reviews, SST