Mai Oudah


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2020

pdf bib
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
Ossama Obeid | Nasser Zalmout | Salam Khalifa | Dima Taji | Mai Oudah | Bashar Alhafni | Go Inoue | Fadhl Eryani | Alexander Erdmann | Nizar Habash
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.

2019

pdf bib
A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance
Alexander Erdmann | Salam Khalifa | Mai Oudah | Nizar Habash | Houda Bouamor
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input. Our technique involves creating a small grammar of closed-class affixes which can be written in a few hours. The grammar over generates analyses for word forms attested in a raw corpus which are disambiguated based on features of the linguistic base proposed for each form. Extending the grammar to cover orthographic, morpho-syntactic or lexical variation is simple, making it an ideal solution for challenging corpora with noisy, dialect-inconsistent, or otherwise non-standard content. In two evaluations, we consistently outperform competitive unsupervised baselines and approach the performance of state-of-the-art supervised models trained on large amounts of data, providing evidence for the value of linguistic input during preprocessing.

pdf bib
The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation
Mai Oudah | Amjad Almahairi | Nizar Habash
Proceedings of Machine Translation Summit XVII: Research Track

2012

pdf bib
A Pipeline Arabic Named Entity Recognition using a Hybrid Approach
Mai Oudah | Khaled Shaalan
Proceedings of COLING 2012