Abstract
Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing supplies appropriate algorithms for trying to reach this objective, all research efforts are directed toward the English language. This strongly limits the classification power on non-English languages. In this paper, we test several learning frameworks for identifying hate speech in Italian text. We release HATE-ITA, a multi-language model trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono-lingual models and seems to adapt well also on language-specific slurs. We hope our findings will encourage the research in other mid-to-low resource communities and provide a valuable benchmarking tool for the Italian community.- Anthology ID:
- 2022.woah-1.24
- Volume:
- Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington (Hybrid)
- Editors:
- Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
- Venue:
- WOAH
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 252–260
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.woah-1.24/
- DOI:
- 10.18653/v1/2022.woah-1.24
- Cite (ACL):
- Debora Nozza, Federico Bianchi, and Giuseppe Attanasio. 2022. HATE-ITA: Hate Speech Detection in Italian Social Media Text. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 252–260, Seattle, Washington (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- HATE-ITA: Hate Speech Detection in Italian Social Media Text (Nozza et al., WOAH 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.woah-1.24.pdf
- Code
- milanlproc/hate-ita