Towards Fairness Assessment of Dutch Hate Speech Detection
Julie Bauer, Rishabh Kaushal, Thales Bertaglia, Adriana Iamnitchi
Abstract
Numerous studies have proposed computational methods to detect hate speech online, yet most focus on the English language and emphasize model development. In this study, we evaluate the counterfactual fairness of hate speech detection models in the Dutch language, specifically examining the performance and fairness of transformer-based models.We make the following key contributions. First, we curate a list of Dutch Social Group Terms that reflect social context. Second, we generate counterfactual data for Dutch hate speech using LLMs and established strategies like Manual Group Substitution (MGS) and Sentence Log-Likelihood (SLL). Through qualitative evaluation, we highlight the challenges of generating realistic counterfactuals, particularly with Dutch grammar and contextual coherence. Third, we fine-tune baseline transformer-based models with counterfactual data and evaluate their performance in detecting hate speech. Fourth, we assess the fairness of these models using Counterfactual Token Fairness (CTF) and group fairness metrics, including equality of odds and demographic parity. Our analysis shows that models perform better in terms of hate speech detection, average counterfactual fairness and group fairness. This work addresses a significant gap in the literature on counterfactual fairness for hate speech detection in Dutch and provides practical insights and recommendations for improving both model performance and fairness.- Anthology ID:
- 2025.woah-1.28
- Volume:
- Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
- Venues:
- WOAH | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 312–324
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.woah-1.28/
- DOI:
- Cite (ACL):
- Julie Bauer, Rishabh Kaushal, Thales Bertaglia, and Adriana Iamnitchi. 2025. Towards Fairness Assessment of Dutch Hate Speech Detection. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 312–324, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Towards Fairness Assessment of Dutch Hate Speech Detection (Bauer et al., WOAH 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.woah-1.28.pdf