Khouloud Mnassri
2025
RAG and Recall: Multilingual Hate Speech Detection with Semantic Memory
Khouloud Mnassri
|
Reza Farahbakhsh
|
Noel Crespi
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Multilingual hate speech detection presents a challenging task, particularly in limited-resource contexts when performance is affected by cultural nuances and data scarcity. Fine-tuned models are often unable to generalize beyond their training, which limits their efficiency, especially for low-resource languages. In this paper, we introduce HS-RAG, a retrieval-augmented generation (RAG) system that directly leverages knowledge, in English, French, and Arabic, from Hate Speech Superset (publicly available dataset) and Wikipedia to Large Language Models (LLMs). To further enhance robustness, we introduce HS-MemRAG, a memory-augmented extension that integrates a semantic cache. This model reduces redundant retrieval while improving contextual relevance and hate speech detection among the three languages.