Mitigating Biases in Hate Speech Detection from A Causal Perspective

Zhehao Zhang; Jiaao Chen; Diyi Yang

doi:10.18653/v1/2023.findings-emnlp.440

Mitigating Biases in Hate Speech Detection from A Causal Perspective

Abstract

Nowadays, many hate speech detectors are built to automatically detect hateful content. However, their training sets are sometimes skewed towards certain stereotypes (e.g., race or religion-related). As a result, the detectors are prone to depend on some shortcuts for predictions. Previous works mainly focus on token-level analysis and heavily rely on human experts’ annotations to identify spurious correlations, which is not only costly but also incapable of discovering higher-level artifacts. In this work, we use grammar induction to find grammar patterns for hate speech and analyze this phenomenon from a causal perspective. Concretely, we categorize and verify different biases based on their spuriousness and influence on the model prediction. Then, we propose two mitigation approaches including Multi-Task Intervention and Data-Specific Intervention based on these confounders. Experiments conducted on 9 hate speech datasets demonstrate the effectiveness of our approaches.

Anthology ID:: 2023.findings-emnlp.440
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6610–6625
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.findings-emnlp.440/
DOI:: 10.18653/v1/2023.findings-emnlp.440
Bibkey:
Cite (ACL):: Zhehao Zhang, Jiaao Chen, and Diyi Yang. 2023. Mitigating Biases in Hate Speech Detection from A Causal Perspective. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6610–6625, Singapore. Association for Computational Linguistics.
Cite (Informal):: Mitigating Biases in Hate Speech Detection from A Causal Perspective (Zhang et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.findings-emnlp.440.pdf

PDF Cite Search Fix data