Some Odd Adversarial Perturbations and the Notion of Adversarial Closeness

Shakila Mahjabin Tonni, Pedro Faustini, Mark Dras


Abstract
Deep learning models for language are vulnerable to adversarial examples. However, the perturbations introduced can sometimes seem odd or very noticeable to humans, which can make them less effective, a notion captured in some recent investigations as a property of '(non-)suspicion’. In this paper, we focus on three main types of perturbations that may raise suspicion: changes to named entities, inconsistent morphological inflections, and the use of non-English words. We define a notion of adversarial closeness and collect human annotations to construct two new datasets. We then use these datasets to investigate whether these kinds of perturbations have a disproportionate effect on human judgements. Following that, we propose new constraints to include in a constraint-based optimisation approach to adversarial text generation. Our human evaluation shows that these do improve the process by preventing the generation of especially odd or marked texts.
Anthology ID:
2025.alta-main.3
Volume:
Proceedings of The 23rd Annual Workshop of the Australasian Language Technology Association
Month:
November
Year:
2025
Address:
Sydney, Australia
Editors:
Jonathan K. Kummerfeld, Aditya Joshi, Mark Dras
Venue:
ALTA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–44
Language:
URL:
https://preview.aclanthology.org/ingest-alta/2025.alta-main.3/
DOI:
Bibkey:
Cite (ACL):
Shakila Mahjabin Tonni, Pedro Faustini, and Mark Dras. 2025. Some Odd Adversarial Perturbations and the Notion of Adversarial Closeness. In Proceedings of The 23rd Annual Workshop of the Australasian Language Technology Association, pages 28–44, Sydney, Australia. Association for Computational Linguistics.
Cite (Informal):
Some Odd Adversarial Perturbations and the Notion of Adversarial Closeness (Tonni et al., ALTA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-alta/2025.alta-main.3.pdf