Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

Alvin Chan; Yi Tay; Yew-Soon Ong; Aston Zhang

doi:10.18653/v1/2020.findings-emnlp.373

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang

Abstract

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a ‘backdoor poisoning’ attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier’s predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.

Anthology ID:: 2020.findings-emnlp.373
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4175–4189
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.373/
DOI:: 10.18653/v1/2020.findings-emnlp.373
Bibkey:
Cite (ACL):: Alvin Chan, Yi Tay, Yew-Soon Ong, and Aston Zhang. 2020. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4175–4189, Online. Association for Computational Linguistics.
Cite (Informal):: Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder (Chan et al., Findings 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.373.pdf
Video:: https://slideslive.com/38940808
Code: alvinchangw/CARA_EMNLP2020 + additional community code
Data: MultiNLI, SNLI

PDF Cite Search Code Video Fix data