Smart Lexical Search for Label Flipping Adversial Attack

Alberto Gutiérrez Megías; Salud María Jiménez-Zafra; L. Alfonso Ureña; Eugenio Martínez-Cámara

Smart Lexical Search for Label Flipping Adversial Attack

Alberto Gutiérrez-Megías, Salud María Jiménez-Zafra, L. Alfonso Ureña, Eugenio Martínez-Cámara

Abstract

Language models are susceptible to vulnerability through adversarial attacks, using manipulations of the input data to disrupt their performance. Accordingly, it represents a cibersecurity leak. Data manipulations are intended to be unidentifiable by the learning model and by humans, small changes can disturb the final label of a classification task. Hence, we propose a novel attack built upon explainability methods to identify the salient lexical units to alter in order to flip the classification label. We asses our proposal on a disinformation dataset, and we show that our attack reaches high balance among stealthiness and efficiency.

Anthology ID:: 2024.privatenlp-1.11
Volume:: Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Ivan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
Venues:: PrivateNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 97–106
Language:
URL:: https://aclanthology.org/2024.privatenlp-1.11
DOI:
Bibkey:
Cite (ACL):: Alberto Gutiérrez-Megías, Salud María Jiménez-Zafra, L. Alfonso Ureña, and Eugenio Martínez-Cámara. 2024. Smart Lexical Search for Label Flipping Adversial Attack. In Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, pages 97–106, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Smart Lexical Search for Label Flipping Adversial Attack (Gutiérrez-Megías et al., PrivateNLP-WS 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.privatenlp-1.11.pdf

PDF Search