Abstract
Recently it has been shown that state-of-the-art NLP models are vulnerable to adversarial attacks, where the predictions of a model can be drastically altered by slight modifications to the input (such as synonym substitutions). While several defense techniques have been proposed, and adapted, to the discrete nature of text adversarial attacks, the benefits of general-purpose regularization methods such as label smoothing for language models, have not been studied. In this paper, we study the adversarial robustness provided by label smoothing strategies in foundational models for diverse NLP tasks in both in-domain and out-of-domain settings. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.- Anthology ID:
- 2023.acl-short.58
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 657–669
- Language:
- URL:
- https://aclanthology.org/2023.acl-short.58
- DOI:
- 10.18653/v1/2023.acl-short.58
- Cite (ACL):
- Yahan Yang, Soham Dan, Dan Roth, and Insup Lee. 2023. In and Out-of-Domain Text Adversarial Robustness via Label Smoothing. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 657–669, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- In and Out-of-Domain Text Adversarial Robustness via Label Smoothing (Yang et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.acl-short.58.pdf