Fadi Zaraket


2024

pdf
AREEj: Arabic Relation Extraction with Evidence
Osama Mraikhat | Hadi Hamoud | Fadi Zaraket
Proceedings of The Second Arabic Natural Language Processing Conference

Relational entity extraction is key in building knowledge graphs. A relational entity has a source, a tail and atype. In this paper, we consider Arabic text and introduce evidence enrichment which intuitivelyinforms models for better predictions. Relational evidence is an expression in the textthat explains how sources and targets relate. %It also provides hints from which models learn. This paper augments the existing relational extraction dataset with evidence annotation to its 2.9-million Arabic relations.We leverage the augmented dataset to build , a relation extraction with evidence model from Arabic documents. The evidence augmentation model we constructed to complete the dataset achieved .82 F1-score (.93 precision, .73 recall). The target outperformed SOTA mREBEL with .72 F1-score (.78 precision, .66 recall).

pdf
DRU at WojoodNER 2024: A Multi-level Method Approach
Hadi Hamoud | Chadi Chakra | Nancy Hamdan | Osama Mraikhat | Doha Albared | Fadi Zaraket
Proceedings of The Second Arabic Natural Language Processing Conference

In this paper, we present our submission for the WojoodNER 2024 Shared Tasks addressing flat and nested sub-tasks (1, 2). We experiment with three different approaches. We train (i) an Arabic fine-tuned version of BLOOMZ-7b-mt, GEMMA-7b, and AraBERTv2 on multi-label token classifications task; (ii) two AraBERTv2 models, on main types and sub-types respectively; and (iii) one model for main types and four for the four sub-types. Based on the Wojood NER 2024 test set results, the three fine-tuned models performed similarly with AraBERTv2 favored (F1: Flat=.8780 Nested=.9040). The five model approach performed slightly better (F1: Flat=.8782 Nested=.9043).

pdf
DRU at WojoodNER 2024: ICL LLM for Arabic NER
Nancy Hamdan | Hadi Hamoud | Chadi Chakra | Osama Mraikhat | Doha Albared | Fadi Zaraket
Proceedings of The Second Arabic Natural Language Processing Conference

This paper details our submission to the WojoodNER Shared Task 2024, leveraging in-context learning with large language models for Arabic Named Entity Recognition. We utilized the Command R model, to perform fine-grained NER on the Wojood-Fine corpus. Our primary approach achieved an F1 score of 0.737 and a recall of 0.756. Post-processing the generated predictions to correct format inconsistencies resulted in an increased recall of 0.759, and a similar F1 score of 0.735. A multi-level prompting method and aggregation of outputs resulted in a lower F1 score of 0.637. Our results demonstrate the potential of ICL for Arabic NER while highlighting challenges related to LLM output consistency.

2023

pdf bib
Nâbra: Syrian Arabic Dialects with Morphological Annotations
Amal Nayouf | Tymaa Hammouda | Mustafa Jarrar | Fadi Zaraket | Mohamad-Bassam Kurdy
Proceedings of ArabicNLP 2023

This paper presents Nâbra (نَبْرَة), a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts of movies and series, lyrics of songs and local proverbs to build Nâbra. Nâbra covers several local Syrian dialects including those of Aleppo, Damascus, Deir-ezzur, Hama, Homs, Huran, Latakia, Mardin, Raqqah, and Suwayda. A team of nine annotators annotated the 60K tokens with full morphological annotations across sentence contexts. We trained the annotators to follow methodological annotation guidelines to ensure unique morpheme annotations, and normalized the annotations. F1 and 𝜅 agreement scores ranged between 74% and 98% across features, showing the excellent quality of Nâbra annotations. Our corpora are open-source and publicly available as part of the Currasat portal https://sina.birzeit.edu/currasat.

pdf
Arabic Topic Classification in the Generative and AutoML Era
Doha Albared | Hadi Hamoud | Fadi Zaraket
Proceedings of ArabicNLP 2023

Most recent models for Arabic topic classification leveraged fine-tuning existing pre-trained transformer models and targeted a limited number of categories. More recently, advances in automated ML and generative models introduced novel potentials for the task. While these approaches work for English, it is a question of whether they perform well for low-resourced languages; Arabic in particular. This paper presents (i) ArBoNeClass; a novel Arabic dataset with an extended 14-topic class set covering modern books from social sciences and humanities along with newspaper articles, and (ii) a set of topic classifiers built from it. We finetuned an open LLM model to build ArGTClass. We compared its performance against the best models built with Vertex AI (Google), AutoML(H2O), and AutoTrain(HuggingFace). ArGTClass outperformed the VertexAi and AutoML models and was reasonably similar to the AutoTrain model.

pdf
DAVE: Differential Diagnostic Analysis Automation and Visualization from Clinical Notes
Hadi Hamoud | Fadi Zaraket | Chadi Abou Chakra | Mira Dankar
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

The Differential Analysis Visualizer for Electronic Medical Records (DAVE) is a tool that utilizes natural language processing and machine learning to help visualize diagnostic algorithms in real-time to help support medical professionals in their clinical decision-making process

2022

pdf
Curras + Baladi: Towards a Levantine Corpus
Karim Al-Haff | Mustafa Jarrar | Tymaa Hammouda | Fadi Zaraket
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents two-fold contributions: a full revision of the Palestinian morphologically annotated corpus (Curras), and a newly annotated Lebanese corpus (Baladi). Both corpora can be used as a more general Levantine corpus. Baladi consists of around 9.6K morphologically annotated tokens. Each token was manually annotated with several morphological features and using LDC’s SAMA lemmas and tags. The inter-annotator evaluation on most features illustrates 78.5% Kappa and 90.1% F1-Score. Curras was revised by refining all annotations for accuracy, normalization and unification of POS tags, and linking with SAMA lemmas. This revision was also important to ensure that both corpora are compatible and can help to bridge the nuanced linguistic gaps that exist between the two highly mutually intelligible dialects. Both corpora are publicly available through a web portal.

2012

pdf
Arabic Morphological Analyzer with Agglutinative Affix Morphemes and Fusional Concatenation Rules
Fadi Zaraket | Jad Makhlouta
Proceedings of COLING 2012: Demonstration Papers