CheckHARD: Checking Hard Labels for Adversarial Text Detection, Prediction Correction, and Perturbed Word Suggestion
Hoang-Quoc Nguyen-Son, Huy Quang Ung, Seira Hidano, Kazuhide Fukushima, Shinsaku Kiyomoto
Abstract
An adversarial attack generates harmful text that fools a target model. More dangerously, this text is unrecognizable by humans. Existing work detects adversarial text and corrects a target’s prediction by identifying perturbed words and changing them into their synonyms, but many benign words are also changed. In this paper, we directly detect adversarial text, correct the prediction, and suggest perturbed words by checking the change in the hard labels from the target’s predictions after replacing a word with its transformation using a model that we call CheckHARD. The experiments demonstrate that CheckHARD outperforms existing work on various attacks, models, and datasets.- Anthology ID:
- 2022.findings-emnlp.210
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2903–2913
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.210/
- DOI:
- 10.18653/v1/2022.findings-emnlp.210
- Cite (ACL):
- Hoang-Quoc Nguyen-Son, Huy Quang Ung, Seira Hidano, Kazuhide Fukushima, and Shinsaku Kiyomoto. 2022. CheckHARD: Checking Hard Labels for Adversarial Text Detection, Prediction Correction, and Perturbed Word Suggestion. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2903–2913, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- CheckHARD: Checking Hard Labels for Adversarial Text Detection, Prediction Correction, and Perturbed Word Suggestion (Nguyen-Son et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.210.pdf