Abstract
We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of tokens probabilities. Our algorithm mitigates the gap between adversarial loss for continuous and discrete text representations by performing multi-step quantization in a quantization-compensation loop. Experiments show that our method significantly outperforms other approaches on various natural language processing (NLP) tasks.- Anthology ID:
- 2023.eacl-main.149
- Volume:
- Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Andreas Vlachos, Isabelle Augenstein
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2038–2048
- Language:
- URL:
- https://aclanthology.org/2023.eacl-main.149
- DOI:
- 10.18653/v1/2023.eacl-main.149
- Cite (ACL):
- Piotr Gaiński and Klaudia Bałazy. 2023. Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2038–2048, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks (Gaiński & Bałazy, EACL 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.eacl-main.149.pdf