Ruyi Zhang
2025
BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance
Ruyi Zhang
|
Songlei Jian
|
Yusong Tan
|
Heng Gao
|
Haifang Zhou
|
Kai Lu
Findings of the Association for Computational Linguistics: ACL 2025
Current backdoor attack defenders in Natural Language Processing (NLP) typically involve data reduction or model pruning, risking losing crucial information. To address this challenge, we introduce a novel backdoor defender, i.e., BadWindtunnel, in which we build a high-noise simulated training environment, similar to the wind tunnel, which allows precise control over training conditions to model the backdoor learning behavior without affecting the final model. We also use the confidence variance as a learning behavior quantification metric in the simulated training, which is based on the characteristics of backdoor-poisoned data (shorted in poisoned data): higher learnability and robustness. In addition, we propose a two-step strategy to further model poisoned data, including target label identification and poisoned data revealing. Extensive experiments demonstrate BadWindtunnel’s superiority, with a 21% higher average reduction in attack success rate than the second-best defender.