WojoodRelations: Arabic Relation Extraction Corpus and Modeling

Alaa Aljabari; Mohammed Khalilia; Mustafa Jarrar

Wojood^Relations: Arabic Relation Extraction Corpus and Modeling

Alaa Aljabari, Mohammed Khalilia, Mustafa Jarrar

Abstract

Relation extraction (RE) is a core task in natural language processing, crucial for semantic understanding, knowledge graph construction, and enhancing downstream applications. Existing work on Arabic RE remains limited due to the language’s rich morphology and syntactic complexity, and the lack of large, high-quality datasets. In this paper, we present Wojood^Relations, the largest and most diverse Arabic RE corpus to date, containing over 33K sentences (∼550K tokens) annotated with ∼15K relation triples across 40 relation types. The corpus is built on top of Wojood NER dataset with manual relation annotations carried out by expert annotators, achieving a Cohen’s 𝜅 of 0.92, indicating high reliability. In addition, we propose two methods: NLI-RE, which formulates RE as a binary natural language inference problem using relation-aware templates, and GPT-Joint, a few-shot LLM framework for joint entity and RE via relation-aware retrieval. Finally, we benchmark the dataset using both supervised models and in-context learning with LLMs. Supervised models achieve 92.89% F1 for RE, while LLMs obtain 72.73% F1 for joint entity and RE. These results establish strong baselines, highlight key challenges, and provide a foundation for advancing Arabic RE research.

Anthology ID:: 2025.emnlp-main.1741
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34330–34348
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1741/
DOI:
Bibkey:
Cite (ACL):: Alaa Aljabari, Mohammed Khalilia, and Mustafa Jarrar. 2025. WojoodRelations: Arabic Relation Extraction Corpus and Modeling. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34330–34348, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: WojoodRelations: Arabic Relation Extraction Corpus and Modeling (Aljabari et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1741.pdf
Checklist:: 2025.emnlp-main.1741.checklist.pdf

PDF Cite Search Checklist Fix data