Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen, Mattia Samory, Claudia Wagner, Isabelle Augenstein
Abstract
Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited to promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD—perturbations of core features—may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hate and non-sexist usage of identity and gendered terms. On these hard cases, models trained on CAD, especially construct-driven CAD, show higher false positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD—construct-driven and construct-agnostic—reduces such unintended bias.- Anthology ID:
- 2022.naacl-main.347
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4716–4726
- Language:
- URL:
- https://aclanthology.org/2022.naacl-main.347
- DOI:
- 10.18653/v1/2022.naacl-main.347
- Cite (ACL):
- Indira Sen, Mattia Samory, Claudia Wagner, and Isabelle Augenstein. 2022. Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4716–4726, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection (Sen et al., NAACL 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.naacl-main.347.pdf