Combating Adversarial Misspellings with Robust Word Recognition

Danish Pruthi, Bhuwan Dhingra, Zachary C. Lipton


Abstract
To combat adversarial spelling mistakes, we propose placing a word recognition model in front of the downstream classifier. Our word recognition models build upon the RNN semi-character architecture, introducing several new backoff strategies for handling rare and unseen words. Trained to recognize words corrupted by random adds, drops, swaps, and keyboard mistakes, our method achieves 32% relative (and 3.3% absolute) error reduction over the vanilla semi-character model. Notably, our pipeline confers robustness on the downstream classifier, outperforming both adversarial training and off-the-shelf spell checkers. Against a BERT model fine-tuned for sentiment analysis, a single adversarially-chosen character attack lowers accuracy from 90.3% to 45.8%. Our defense restores accuracy to 75%. Surprisingly, better word recognition does not always entail greater robustness. Our analysis reveals that robustness also depends upon a quantity that we denote the sensitivity.
Anthology ID:
P19-1561
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5582–5591
Language:
URL:
https://aclanthology.org/P19-1561
DOI:
10.18653/v1/P19-1561
Bibkey:
Cite (ACL):
Danish Pruthi, Bhuwan Dhingra, and Zachary C. Lipton. 2019. Combating Adversarial Misspellings with Robust Word Recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5582–5591, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Combating Adversarial Misspellings with Robust Word Recognition (Pruthi et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/P19-1561.pdf
Code
 danishpruthi/Adversarial-Misspellings +  additional community code
Data
IMDb Movie ReviewsMRPC