Identifying Human Strategies for Generating Word-Level Adversarial Examples

Maximilian Mozes; Bennett Kleinberg; Lewis Griffin

doi:10.18653/v1/2022.findings-emnlp.454

Identifying Human Strategies for Generating Word-Level Adversarial Examples

Maximilian Mozes, Bennett Kleinberg, Lewis Griffin

Abstract

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in an input sequence. With our findings, we seek to inspire efforts that harness human strategies for more robust NLP models.

Anthology ID:: 2022.findings-emnlp.454
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6118–6126
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.454
DOI:: 10.18653/v1/2022.findings-emnlp.454
Bibkey:
Cite (ACL):: Maximilian Mozes, Bennett Kleinberg, and Lewis Griffin. 2022. Identifying Human Strategies for Generating Word-Level Adversarial Examples. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6118–6126, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Identifying Human Strategies for Generating Word-Level Adversarial Examples (Mozes et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-emnlp.454.pdf

PDF Search