Fadhl Eryani


2021

pdf
Automatic Romanization of Arabic Bibliographic Records
Fadhl Eryani | Nizar Habash
Proceedings of the Sixth Arabic Natural Language Processing Workshop

International library standards require cataloguers to tediously input Romanization of their catalogue records for the benefit of library users without specific language expertise. In this paper, we present the first reported results on the task of automatic Romanization of undiacritized Arabic bibliographic entries. This complex task requires the modeling of Arabic phonology, morphology, and even semantics. We collected a 2.5M word corpus of parallel Arabic and Romanized bibliographic entries, and benchmarked a number of models that vary in terms of complexity and resource dependence. Our best system reaches 89.3% exact word Romanization on a blind test set. We make our data and code publicly available.

2020

pdf
A Spelling Correction Corpus for Multiple Arabic Dialects
Fadhl Eryani | Nizar Habash | Houda Bouamor | Salam Khalifa
Proceedings of the Twelfth Language Resources and Evaluation Conference

Arabic dialects are the non-standard varieties of Arabic commonly spoken – and increasingly written on social media – across the Arab world. Arabic dialects do not have standard orthographies, a challenge for natural language processing applications. In this paper, we present the MADAR CODA Corpus, a collection of 10,000 sentences from five Arabic city dialects (Beirut, Cairo, Doha, Rabat, and Tunis) represented in the Conventional Orthography for Dialectal Arabic (CODA) in parallel with their raw original form. The sentences come from the Multi-Arabic Dialect Applications and Resources (MADAR) Project and are in parallel across the cities (2,000 sentences from each city). This publicly available resource is intended to support research on spelling correction and text normalization for Arabic dialects. We present results on a bootstrapping technique we use to speed up the CODA annotation, as well as on the degree of similarity across the dialects before and after CODA annotation.

pdf
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
Ossama Obeid | Nasser Zalmout | Salam Khalifa | Dima Taji | Mai Oudah | Bashar Alhafni | Go Inoue | Fadhl Eryani | Alexander Erdmann | Nizar Habash
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.

2018

pdf
An Arabic Morphological Analyzer and Generator with Copious Features
Dima Taji | Salam Khalifa | Ossama Obeid | Fadhl Eryani | Nizar Habash
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more. This tool includes a fast engine that can be easily integrated into other systems, as well as an easy-to-use API and a web interface. CALIMA-Star also supports morphological reinflection. We evaluate CALIMA-Star against four commonly used analyzers for Arabic in terms of speed and morphological content.

pdf
The MADAR Arabic Dialect Corpus and Lexicon
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Unified Guidelines and Resources for Arabic Dialect Orthography
Nizar Habash | Fadhl Eryani | Salam Khalifa | Owen Rambow | Dana Abdulrahim | Alexander Erdmann | Reem Faraj | Wajdi Zaghouani | Houda Bouamor | Nasser Zalmout | Sara Hassan | Faisal Al-Shargi | Sakhar Alkhereyf | Basma Abdulkareem | Ramy Eskander | Mohammad Salameh | Hind Saddiki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
A Morphologically Annotated Corpus of Emirati Arabic
Salam Khalifa | Nizar Habash | Fadhl Eryani | Ossama Obeid | Dana Abdulrahim | Meera Al Kaabi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)