BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance

Ruyi Zhang; Songlei Jian; Yusong Tan; Heng Gao; Haifang Zhou; Kai Lu

doi:10.18653/v1/2025.findings-acl.482

BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance

Ruyi Zhang, Songlei Jian, Yusong Tan, Heng Gao, Haifang Zhou, Kai Lu

Abstract

Current backdoor attack defenders in Natural Language Processing (NLP) typically involve data reduction or model pruning, risking losing crucial information. To address this challenge, we introduce a novel backdoor defender, i.e., BadWindtunnel, in which we build a high-noise simulated training environment, similar to the wind tunnel, which allows precise control over training conditions to model the backdoor learning behavior without affecting the final model. We also use the confidence variance as a learning behavior quantification metric in the simulated training, which is based on the characteristics of backdoor-poisoned data (shorted in poisoned data): higher learnability and robustness. In addition, we propose a two-step strategy to further model poisoned data, including target label identification and poisoned data revealing. Extensive experiments demonstrate BadWindtunnel’s superiority, with a 21% higher average reduction in attack success rate than the second-best defender.

Anthology ID:: 2025.findings-acl.482
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9259–9273
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.482/
DOI:: 10.18653/v1/2025.findings-acl.482
Bibkey:
Cite (ACL):: Ruyi Zhang, Songlei Jian, Yusong Tan, Heng Gao, Haifang Zhou, and Kai Lu. 2025. BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance. In Findings of the Association for Computational Linguistics: ACL 2025, pages 9259–9273, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.482.pdf

PDF Cite Search Fix data