Samuel Guimarães


2024

pdf
HausaHate: An Expert Annotated Corpus for Hausa Hate Speech Detection
Francielle Vargas | Samuel Guimarães | Shamsuddeen Hassan Muhammad | Diego Alves | Ibrahim Said Ahmad | Idris Abdulmumin | Diallo Mohamed | Thiago Pardo | Fabrício Benevenuto
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

We introduce the first expert annotated corpus of Facebook comments for Hausa hate speech detection. The corpus titled HausaHate comprises 2,000 comments extracted from Western African Facebook pages and manually annotated by three Hausa native speakers, who are also NLP experts. Our corpus was annotated using two different layers. We first labeled each comment according to a binary classification: offensive versus non-offensive. Then, offensive comments were also labeled according to hate speech targets: race, gender and none. Lastly, a baseline model using fine-tuned LLM for Hausa hate speech detection is presented, highlighting the challenges of hate speech detection tasks for indigenous languages in Africa, as well as future advances.