Shinsaku Kiyomoto


2023

pdf
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations
Hoang-Quoc Nguyen-Son | Seira Hidano | Kazuhide Fukushima | Shinsaku Kiyomoto | Isao Echizen
Findings of the Association for Computational Linguistics: ACL 2023

Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.

2022

pdf
CheckHARD: Checking Hard Labels for Adversarial Text Detection, Prediction Correction, and Perturbed Word Suggestion
Hoang-Quoc Nguyen-Son | Huy Quang Ung | Seira Hidano | Kazuhide Fukushima | Shinsaku Kiyomoto
Findings of the Association for Computational Linguistics: EMNLP 2022

An adversarial attack generates harmful text that fools a target model. More dangerously, this text is unrecognizable by humans. Existing work detects adversarial text and corrects a target’s prediction by identifying perturbed words and changing them into their synonyms, but many benign words are also changed. In this paper, we directly detect adversarial text, correct the prediction, and suggest perturbed words by checking the change in the hard labels from the target’s predictions after replacing a word with its transformation using a model that we call CheckHARD. The experiments demonstrate that CheckHARD outperforms existing work on various attacks, models, and datasets.

2021

pdf
Machine Translated Text Detection Through Text Similarity with Round-Trip Translation
Hoang-Quoc Nguyen-Son | Tran Thao | Seira Hidano | Ishita Gupta | Shinsaku Kiyomoto
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Translated texts have been used for malicious purposes, i.e., plagiarism or fake reviews. Existing detectors have been built around a specific translator (e.g., Google) but fail to detect a translated text from a strange translator. If we use the same translator, the translated text is similar to its round-trip translation, which is when text is translated into another language and translated back into the original language. However, a round-trip translated text is significantly different from the original text or a translated text using a strange translator. Hence, we propose a detector using text similarity with round-trip translation (TSRT). TSRT achieves 86.9% accuracy in detecting a translated text from a strange translator. It outperforms existing detectors (77.9%) and human recognition (53.3%).

pdf bib
SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text
Hoang-Quoc Nguyen-Son | Seira Hidano | Kazuhide Fukushima | Shinsaku Kiyomoto
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

2019

pdf
Detecting Machine-Translated Text using Back Translation
Hoang-Quoc Nguyen-Son | Thao Tran Phuong | Seira Hidano | Shinsaku Kiyomoto
Proceedings of the 12th International Conference on Natural Language Generation

Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text’s intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which is higher than 66.7% of the best previous accuracy. We also achieve similar results not only with F-score but also with similar experiments related to Japanese. Moreover, we prove that our detector can recognize both machine-translated and machine-back-translated texts without the language information which is used to generate these machine texts. It demonstrates the persistence of our method in various applications in both low- and rich-resource languages.