Debiasing Static Embeddings for Hate Speech Detection

Ling Sun, Soyoung Kim, Xiao Dong, Sandra Kübler


Abstract
We examine how embedding bias affects hate speech detection by evaluating two debiasing methods—hard-debiasing and soft-debiasing. We analyze stereotype and sentiment associations within the embedding space and assess whether debiased models reduce censorship of marginalized authors while improving detection of hate speech targeting these groups. Our findings highlight how embedding bias propagates into downstream tasks and demonstrates how well different embedding bias metrics can predict bias in hate speech detection.
Anthology ID:
2025.woah-1.8
Volume:
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–76
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.woah-1.8/
DOI:
Bibkey:
Cite (ACL):
Ling Sun, Soyoung Kim, Xiao Dong, and Sandra Kübler. 2025. Debiasing Static Embeddings for Hate Speech Detection. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 67–76, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Debiasing Static Embeddings for Hate Speech Detection (Sun et al., WOAH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.woah-1.8.pdf