Najet Hadj Mohamed

2022

pdf abs
Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure
Najet Hadj Mohamed | Cherifa Ben Khelil | Agata Savary | Iskandar Keskes | Jean-Yves Antoine | Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. Theapplicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in theearly annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADTwas selected and annotated by two Arabic native speakers independently. Following their annotations, anew Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds thesmallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guide-line schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.

pdf abs
Annotation d’expressions polylexicales verbales en arabe : validation d’une procédure d’annotation multilingue (Annotating Verbal Multiword Expressions in Arabic : Assessing the Validity of a Multilingual)
Najet Hadj Mohamed | Cherifa Ben Khelil | Agata Savary | Iskander Keskes | Jean Yves Antoine | Lamia Hadrich Belguith
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Cet article décrit nos efforts pour étendre le projet PARSEME à l’arabe standard moderne. L’applicabilité du guide d’annotation de PARSEME a été testée en mesurant l’accord inter-annotateurs dès la première phase d’annotation. Un sous-ensemble de 1062 phrases du Prague Arabic Dependency Treebank (PADT) a été sélectionné et annoté indépendamment par deux locutrices natives arabes. Suite à leurs annotations, un nouveau corpus arabe avec plus de 1250 expressions polylexicales verbales (EPV) annotées a été construit.

pdf abs
Enhancing the PARSEME Turkish Corpus of Verbal Multiword Expressions
Yagmur Ozturk | Najet Hadj Mohamed | Adam Lion-Bouton | Agata Savary
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

The PARSEME (Parsing and Multiword Expressions) project proposes multilingual corpora annotated for multiword expressions (MWEs). In this case study, we focus on the Turkish corpus of PARSEME. Turkish is an agglutinative language and shows high inflection and derivation in word forms. This can cause some issues in terms of automatic morphosyntactic annotation. We provide an overview of the problems observed in the morphosyntactic annotation of the Turkish PARSEME corpus. These issues are mostly observed on the lemmas, which is important for the approximation of a type of an MWE. We propose modifications of the original corpus with some enhancements on the lemmas and parts of speech. The enhancements are then evaluated with an identification system from the PARSEME Shared Task 1.2 to detect MWEs, namely Seen2Seen. Results show increase in the F-measure for MWE identification, emphasizing the necessity of robust morphosyntactic annotation for MWE processing, especially for languages that show high surface variability.

Co-authors

Iskander Keskes 1

Yagmur Ozturk 1

Adam Lion-Bouton 1