On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

SongYang Gao, Shihan Dou, Qi Zhang, Xuanjing Huang, Jin Ma, Ying Shan


Abstract
Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.
Anthology ID:
2023.findings-acl.857
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13573–13581
Language:
URL:
https://aclanthology.org/2023.findings-acl.857
DOI:
10.18653/v1/2023.findings-acl.857
Bibkey:
Cite (ACL):
SongYang Gao, Shihan Dou, Qi Zhang, Xuanjing Huang, Jin Ma, and Ying Shan. 2023. On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13573–13581, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection (Gao et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.findings-acl.857.pdf