Synonym-unaware Fast Adversarial Training against Textual Adversarial Attacks

Yichen Yang, Xin Liu, Kun He


Abstract
Numerous adversarial defense methods have been proposed to strengthen the robustness of Natural Language Processing (NLP) models against adversarial attacks. However, many of these methods rely on predetermined linguistic knowledge and assume that attackers’ synonym candidates are known, which is often unrealistic. In this work, we investigate adversarial training in the embedding space and introduce a Fast Adversarial Training (FAT) method to improve the model robustness without requiring synonym awareness. FAT leverages single-step perturbation generation and effective perturbation initialization based on two key insights: (1) adversarial perturbations generated by single-step and multi-step gradient ascent are similar, and (2) perturbations generated on the same training sample across successive epochs exhibit resemblance. By employing single-step gradient ascent and leveraging historical perturbation information, FAT not only expedites the training process but also efficiently initializes perturbations. Extensive experiments demonstrate that FAT significantly enhances the robustness of popular NLP models under scenarios where synonyms are unknown, outperforming other defense baselines under various character-level and word-level attacks.
Anthology ID:
2025.findings-naacl.43
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
727–739
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.43/
DOI:
Bibkey:
Cite (ACL):
Yichen Yang, Xin Liu, and Kun He. 2025. Synonym-unaware Fast Adversarial Training against Textual Adversarial Attacks. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 727–739, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Synonym-unaware Fast Adversarial Training against Textual Adversarial Attacks (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.43.pdf