This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
IskandarKeskes
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
This paper highlights the importance of integrating MWE identification with the development of syntactic MWE lexicons. It suggests that lexicons with minimal morphosyntactic information can amplify current MWE-annotated datasets and refine identification strategies. To our knowledge, this work represents the first attempt to focus on both seen and unseen of VMWEs for Arabic. It also deals with the challenge of differentiating between literal and figurative interpretations of idiomatic expressions. The approach involves a dual-phase procedure: first projecting a VMWE lexicon onto a corpus to identify candidate occurrences, then disambiguating these occurrences to distinguish idiomatic from literal instances. Experiments outlined in the paper aim to assess the efficacy of this technique, utilizing a lexicon known as LEXAR and the “parseme-ar” corpus. The findings suggest that lexicon-driven strategies have the potential to refine MWE identification, particularly for unseen occurrences.
This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. Theapplicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in theearly annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADTwas selected and annotated by two Arabic native speakers independently. Following their annotations, anew Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds thesmallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guide-line schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.
This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.
Dans cet article, nous nous intéressons au résumé automatique de textes arabes. Nous commençons par présenter une étude analytique réalisée sur un corpus de travail qui nous a permis de déduire, suite à des observations empiriques, un ensemble de relations et de frames (règles ou patrons) rhétoriques; ensuite nous présentons notre méthode de production de résumés pour les textes arabes. La méthode que nous proposons se base sur la Théorie de la Structure Rhétorique (RST) (Mann et al., 1988) et utilise des connaissances purement linguistiques. Le principe de notre proposition s’appuie sur trois piliers. Le premier pilier est le repérage des relations rhétoriques entres les différentes unités minimales du texte dont l’une possède le statut de noyau – segment de texte primordial pour la cohérence – et l’autre a le statut noyau ou satellite – segment optionnel. Le deuxième pilier est le dressage et la simplification de l’arbre RST. Le troisième pilier est la sélection des phrases noyaux formant le résumé final, qui tiennent en compte le type de relation rhétoriques choisi pour l’extrait.