A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection

Jan Fillies, Marius Wawerek, Adrian Paschke


Abstract
Algorithmic hate speech detection is widely used today. However, biases within these systems can lead to discrimination. This research presents an overview of bias mitigation strategies in the field of hate speech detection. The identified principles are grouped into four categories, based on their operation principles. A novel taxonomy of bias mitigation methods is proposed. The mitigation strategies are characterized based on their key concepts and analyzed in terms of their application stage and their need for knowledge of protected attributes. Additionally, the paper discusses potential combinations of these strategies. This research shifts the focus from identifying present biases to examining the similarities and differences between mitigation strategies, thereby facilitating the exchange, stacking, and ensembling of these strategies in future research.
Anthology ID:
2025.woah-1.1
Volume:
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–16
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.woah-1.1/
DOI:
Bibkey:
Cite (ACL):
Jan Fillies, Marius Wawerek, and Adrian Paschke. 2025. A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 1–16, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection (Fillies et al., WOAH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.woah-1.1.pdf
Supplementarymaterial:
 2025.woah-1.1.SupplementaryMaterial.zip