2023
pdf
abs
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations
Hoang-Quoc Nguyen-Son
|
Seira Hidano
|
Kazuhide Fukushima
|
Shinsaku Kiyomoto
|
Isao Echizen
Findings of the Association for Computational Linguistics: ACL 2023
Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.
2022
pdf
abs
CheckHARD: Checking Hard Labels for Adversarial Text Detection, Prediction Correction, and Perturbed Word Suggestion
Hoang-Quoc Nguyen-Son
|
Huy Quang Ung
|
Seira Hidano
|
Kazuhide Fukushima
|
Shinsaku Kiyomoto
Findings of the Association for Computational Linguistics: EMNLP 2022
An adversarial attack generates harmful text that fools a target model. More dangerously, this text is unrecognizable by humans. Existing work detects adversarial text and corrects a target’s prediction by identifying perturbed words and changing them into their synonyms, but many benign words are also changed. In this paper, we directly detect adversarial text, correct the prediction, and suggest perturbed words by checking the change in the hard labels from the target’s predictions after replacing a word with its transformation using a model that we call CheckHARD. The experiments demonstrate that CheckHARD outperforms existing work on various attacks, models, and datasets.
2021
pdf
bib
SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text
Hoang-Quoc Nguyen-Son
|
Seira Hidano
|
Kazuhide Fukushima
|
Shinsaku Kiyomoto
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation