Karin Niederreiter


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Word-Level Detection of Code-Mixed Hate Speech with Multilingual Domain Transfer
Karin Niederreiter | Dagmar Gromann
Findings of the Association for Computational Linguistics: ACL 2025

The exponential growth of offensive language on social media tends to fuel online harassment and challenges detection mechanisms. Hate speech detection is commonly treated as a monolingual or multilingual sentence-level classification task. However, profane language tends to contain code-mixing, a combination of more than one language, which requires a more nuanced detection approach than binary classification. A general lack of available code-mixed datasets aggravates the problem. To address this issue, we propose five word-level annotated hate speech datasets, EN and DE from social networks, one subset of the DE-EN Offensive Content Detection Code-Switched Dataset, one DE-EN code-mixed German rap lyrics held-out test set, and a cross-domain held-out test set. We investigate the capacity of fine-tuned German-only, German-English bilingual, and German-English code-mixed token classification XLM-R models to generalize to code-mixed hate speech in German rap lyrics in zero-shot domain transfer as well as across different domains. The results show that bilingual fine-tuning facilitates not only the detection of code-mixed hate speech, but also neologisms, addressing the inherent dynamics of profane language use.