ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, Houda Bouamor


Abstract
We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014–2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-native engagement signals, including reactions, shares, comments, and page metadata, enabling joint analysis of language and audience response. The resource includes 200 curated terms (100 racism, 100 discrimination) with morphological regex families (13+ inflections per lemma), and 20 discrimination axes capturing identity-based grounds for unequal treatment. It also provides explicit attribution patterns. Released under a restricted research-use license for ethical compliance with platform terms, ArabDiscrim supports weak supervision, axis-aware sampling, and platform ecology research. By bridging lexical depth and ecological validity, it establishes a foundation for fairness-oriented, platform-aware Arabic NLP.
Anthology ID:
2026.lrec-main.929
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11874–11884
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.929/
DOI:
Bibkey:
Cite (ACL):
Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, and Houda Bouamor. 2026. ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination. International Conference on Language Resources and Evaluation, main:11874–11884.
Cite (Informal):
ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination (Zaghouani et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.929.pdf