Mitigating Data Poisoning in Text Classification with Differential Privacy
Chang Xu, Jun Wang, Francisco Guzmán, Benjamin Rubinstein, Trevor Cohn
Abstract
NLP models are vulnerable to data poisoning attacks. One type of attack can plant a backdoor in a model by injecting poisoned examples in training, causing the victim model to misclassify test instances which include a specific pattern. Although defences exist to counter these attacks, they are specific to an attack type or pattern. In this paper, we propose a generic defence mechanism by making the training process robust to poisoning attacks through gradient shaping methods, based on differentially private training. We show that our method is highly effective in mitigating, or even eliminating, poisoning attacks on text classification, with only a small cost in predictive accuracy.- Anthology ID:
- 2021.findings-emnlp.369
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4348–4356
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.369
- DOI:
- 10.18653/v1/2021.findings-emnlp.369
- Cite (ACL):
- Chang Xu, Jun Wang, Francisco Guzmán, Benjamin Rubinstein, and Trevor Cohn. 2021. Mitigating Data Poisoning in Text Classification with Differential Privacy. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4348–4356, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Mitigating Data Poisoning in Text Classification with Differential Privacy (Xu et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.findings-emnlp.369.pdf
- Data
- IMDb Movie Reviews