RAG and Recall: Multilingual Hate Speech Detection with Semantic Memory

Khouloud Mnassri, Reza Farahbakhsh, Noel Crespi


Abstract
Multilingual hate speech detection presents a challenging task, particularly in limited-resource contexts when performance is affected by cultural nuances and data scarcity. Fine-tuned models are often unable to generalize beyond their training, which limits their efficiency, especially for low-resource languages. In this paper, we introduce HS-RAG, a retrieval-augmented generation (RAG) system that directly leverages knowledge, in English, French, and Arabic, from Hate Speech Superset (publicly available dataset) and Wikipedia to Large Language Models (LLMs). To further enhance robustness, we introduce HS-MemRAG, a memory-augmented extension that integrates a semantic cache. This model reduces redundant retrieval while improving contextual relevance and hate speech detection among the three languages.
Anthology ID:
2025.woah-1.20
Volume:
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
219–227
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.woah-1.20/
DOI:
Bibkey:
Cite (ACL):
Khouloud Mnassri, Reza Farahbakhsh, and Noel Crespi. 2025. RAG and Recall: Multilingual Hate Speech Detection with Semantic Memory. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 219–227, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
RAG and Recall: Multilingual Hate Speech Detection with Semantic Memory (Mnassri et al., WOAH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.woah-1.20.pdf
Supplementarymaterial:
 2025.woah-1.20.SupplementaryMaterial.zip