Leveraging Semi-Supervised Learning for Multimodal Hate Speech Data Annotation and Detection

Rathi Adarshi Rammohan, Zhao Ren, Dominik Puchała, Aleksandra Świderska, Dennis Küster, Tanja Schultz


Abstract
While the Internet and social media have fundamentally transformed our lives, they can also rapidly spread hate speech, i.e., derogatory statements targeting individuals or groups based on their immutable characteristics. Automatic detection systems could help limit this harmful phenomenon. However, the lack of large-scale annotated datasets remains a major bottleneck for developing better algorithms. In this work, we employ semi-supervised learning (SSL) to leverage the advantages of limited labeled data alongside large amounts of unlabeled data. We apply three SSL approaches, Fix-match, Full-match, and All-match learning, to enhance the performance of end-to-end pre-trained speech and text models for hate speech detection. Our findings indicate that SSL methods enhance the performance, achieving F1 scores of 0.851 on speech, 0.957 on text, and 0.959 with multimodal fusion. Furthermore, we analyze the impact of different weak augmentation strategies on labeled data and assess the quality of generated pseudo-labels to evaluate their potential use in data annotation.
Anthology ID:
2026.lrec-main.806
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
10266–10275
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.806/
DOI:
Bibkey:
Cite (ACL):
Rathi Adarshi Rammohan, Zhao Ren, Dominik Puchała, Aleksandra Świderska, Dennis Küster, and Tanja Schultz. 2026. Leveraging Semi-Supervised Learning for Multimodal Hate Speech Data Annotation and Detection. International Conference on Language Resources and Evaluation, main:10266–10275.
Cite (Informal):
Leveraging Semi-Supervised Learning for Multimodal Hate Speech Data Annotation and Detection (Rammohan et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.806.pdf