ArabJobs: A Multinational Corpus of Arabic Job Ads

Mo El-Haj


Abstract
ArabJobs is a publicly available corpus of Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the United Arab Emirates. Comprising over 8,500 postings and more than 550,000 words, the dataset captures linguistic, regional, and socio-economic variation in the Arab labour market. We present analyses of gender representation and occupational structure, and highlight dialectal variation across ads, which offers opportunities for future research. We also demonstrate applications such as salary estimation and job category normalisation using large language models, alongside benchmark tasks for gender bias detection and profession classification. The findings show the utility of ArabJobs for fairness-aware Arabic NLP and labour market research. The dataset is publicly available on GitHub: https://github.com/drelhaj/ArabJobs.
Anthology ID:
2025.arabicnlp-main.2
Volume:
Proceedings of The Third Arabic Natural Language Processing Conference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
Venue:
ArabicNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–25
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.2/
DOI:
Bibkey:
Cite (ACL):
Mo El-Haj. 2025. ArabJobs: A Multinational Corpus of Arabic Job Ads. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 16–25, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
ArabJobs: A Multinational Corpus of Arabic Job Ads (El-Haj, ArabicNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.2.pdf