Defensive Dual Masking for Robust Adversarial Defense

Wangli Yang, Jie Yang, Yi Guo, Johan Barthelemy


Abstract
Adversarial defenses for textual data have gained considerable attention in recent years due to the increasing vulnerability of Natural Language Processing (NLP) models to adversarial attacks. These attacks exploit subtle perturbations in input text to deceive models, posing significant challenges to model robustness and reliability. This article introduces Defensive Dual Masking (DDM), a simple yet effective algorithm that uses two unique masking strategies to mitigate adversarial threats. Specifically, during training, [MASK] tokens are directly inserted into input samples to prepare the model for handling perturbed inputs. At inference time, suspicious tokens are identified and strategically replaced with [MASK] tokens, effectively neutralizing perturbations while preserving core semantics of the input text. The theoretical foundation of DDM demonstrates how the proposed masking strategies enhance the model capacity to mitigate adversarial attacks. Empirical evaluations based on four benchmark datasets and four adversarial attacks consistently demonstrate that DDM outperforms state-of-the-art defense techniques, achieving superior robustness and substantial improvements in model accuracy. Furthermore, DDM seamlessly integrates with Large Language Models, enhancing their resilience to adversarial attacks and providing a scalable defense solution for large-scale NLP applications.
Anthology ID:
2026.cl-1.5
Volume:
Computational Linguistics, Volume 52, Issue 1 - March 2026
Month:
March
Year:
2026
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
151–190
Language:
URL:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.cl-1.5/
DOI:
10.1162/coli.a.574
Bibkey:
Cite (ACL):
Wangli Yang, Jie Yang, Yi Guo, and Johan Barthelemy. 2026. Defensive Dual Masking for Robust Adversarial Defense. Computational Linguistics, 52(1):151–190.
Cite (Informal):
Defensive Dual Masking for Robust Adversarial Defense (Yang et al., CL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.cl-1.5.pdf