Word-Level Detection of Code-Mixed Hate Speech with Multilingual Domain Transfer

Karin Niederreiter, Dagmar Gromann


Abstract
The exponential growth of offensive language on social media tends to fuel online harassment and challenges detection mechanisms. Hate speech detection is commonly treated as a monolingual or multilingual sentence-level classification task. However, profane language tends to contain code-mixing, a combination of more than one language, which requires a more nuanced detection approach than binary classification. A general lack of available code-mixed datasets aggravates the problem. To address this issue, we propose five word-level annotated hate speech datasets, EN and DE from social networks, one subset of the DE-EN Offensive Content Detection Code-Switched Dataset, one DE-EN code-mixed German rap lyrics held-out test set, and a cross-domain held-out test set. We investigate the capacity of fine-tuned German-only, German-English bilingual, and German-English code-mixed token classification XLM-R models to generalize to code-mixed hate speech in German rap lyrics in zero-shot domain transfer as well as across different domains. The results show that bilingual fine-tuning facilitates not only the detection of code-mixed hate speech, but also neologisms, addressing the inherent dynamics of profane language use.
Anthology ID:
2025.findings-acl.1086
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21093–21104
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1086/
DOI:
Bibkey:
Cite (ACL):
Karin Niederreiter and Dagmar Gromann. 2025. Word-Level Detection of Code-Mixed Hate Speech with Multilingual Domain Transfer. In Findings of the Association for Computational Linguistics: ACL 2025, pages 21093–21104, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Word-Level Detection of Code-Mixed Hate Speech with Multilingual Domain Transfer (Niederreiter & Gromann, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1086.pdf