Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

Antonis Maronikolakis, Abdullatif Köksal, Hinrich Schuetze


Abstract
We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for Brazil, Germany, India and Kenya, to aid model development and interpretability. First, we demonstrate how HATELEXICON can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target group names. Further, we propose a culturally-informed method to aid shot selection for training in low-resource settings. In few-shot learning, shot selection is of paramount importance to model performance and we need to ensure we make the most of available data. We work with HASOC German and Hindi data for training and the Multilingual HateCheck (MHC) benchmark for evaluation. We show that selecting shots based on our lexicon leads to models performing better than models trained on shots sampled randomly. Thus, when given only a few training examples, using HATELEXICON to select shots containing more sociocultural information leads to better few-shot performance. With these two use-cases we show how our HATELEXICON can be used for more effective hate speech detection.
Anthology ID:
2024.ltedi-1.1
Volume:
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Thenmozhi Durairaj, György Kovács, Miguel Ángel García Cumbreras
Venues:
LTEDI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–13
Language:
URL:
https://aclanthology.org/2024.ltedi-1.1
DOI:
Bibkey:
Cite (ACL):
Antonis Maronikolakis, Abdullatif Köksal, and Hinrich Schuetze. 2024. Sociocultural knowledge is needed for selection of shots in hate speech detection tasks. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 1–13, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks (Maronikolakis et al., LTEDI-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2024.ltedi-1.1.pdf