Detecting Adversarial Samples through Sharpness of Loss Landscape
Rui Zheng, Shihan Dou, Yuhao Zhou, Qin Liu, Tao Gui, Qi Zhang, Zhongyu Wei, Xuanjing Huang, Menghan Zhang
Abstract
Deep neural networks (DNNs) have been proven to be sensitive towards perturbations on input samples, and previous works highlight that adversarial samples are even more vulnerable than normal ones. In this work, this phenomenon is illustrated frWe first show that adversarial samples locate in steep and narrow local minima of the loss landscape (high sharpness) while normal samples, which differs distinctly from adversarial ones, reside in the loss surface that is more flatter (low sharpness).om the perspective of sharpness via visualizing the input loss landscape of models. Based on this, we propose a simple and effective sharpness-based detector to distinct adversarial samples by maximizing the loss increment within the region where the inference sample is located. Considering that the notion of sharpness of a loss landscape is relative, we further propose an adaptive optimization strategy in an attempt to fairly compare the relative sharpness among different samples. Experimental results show that our approach can outperform previous detection methods by large margins (average +6.6 F1 score) for four advanced attack strategies considered in this paper across three text classification tasks.- Anthology ID:
- 2023.findings-acl.717
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11282–11298
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.717
- DOI:
- Cite (ACL):
- Rui Zheng, Shihan Dou, Yuhao Zhou, Qin Liu, Tao Gui, Qi Zhang, Zhongyu Wei, Xuanjing Huang, and Menghan Zhang. 2023. Detecting Adversarial Samples through Sharpness of Loss Landscape. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11282–11298, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Adversarial Samples through Sharpness of Loss Landscape (Zheng et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2023.findings-acl.717.pdf