CASN:Class-Aware Score Network for Textual Adversarial Detection

Rong Bao, Rui Zheng, Liang Ding, Qi Zhang, Dacheng Tao


Abstract
Adversarial detection aims to detect adversarial samples that threaten the security of deep neural networks, which is an essential step toward building robust AI systems. Density-based estimation is widely considered as an effective technique by explicitly modeling the distribution of normal data and identifying adversarial ones as outliers. However, these methods suffer from significant performance degradation when the adversarial samples lie close to the non-adversarial data manifold. To address this limitation, we propose a score-based generative method to implicitly model the data distribution. Our approach utilizes the gradient of the log-density data distribution and calculates the distribution gap between adversarial and normal samples through multi-step iterations using Langevin dynamics. In addition, we use supervised contrastive learning to guide the gradient estimation using label information, which avoids collapsing to a single data manifold and better preserves the anisotropy of the different labeled data distributions. Experimental results on three text classification tasks upon four advanced attack algorithms show that our approach is a significant improvement (average +15.2 F1 score against previous SOTA) over previous detection methods.
Anthology ID:
2023.acl-long.40
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
671–687
Language:
URL:
https://aclanthology.org/2023.acl-long.40
DOI:
10.18653/v1/2023.acl-long.40
Bibkey:
Cite (ACL):
Rong Bao, Rui Zheng, Liang Ding, Qi Zhang, and Dacheng Tao. 2023. CASN:Class-Aware Score Network for Textual Adversarial Detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 671–687, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
CASN:Class-Aware Score Network for Textual Adversarial Detection (Bao et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-long.40.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-long.40.mp4