Abstract
We present XHate-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHate-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain- and language-adaption, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.- Anthology ID:
- 2020.coling-main.559
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 6350–6365
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.559
- DOI:
- 10.18653/v1/2020.coling-main.559
- Cite (ACL):
- Goran Glavaš, Mladen Karan, and Ivan Vulić. 2020. XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6350–6365, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages (Glavaš et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2020.coling-main.559.pdf