2020
pdf
bib
abs
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
Ossama Obeid
|
Nasser Zalmout
|
Salam Khalifa
|
Dima Taji
|
Mai Oudah
|
Bashar Alhafni
|
Go Inoue
|
Fadhl Eryani
|
Alexander Erdmann
|
Nizar Habash
Proceedings of the Twelfth Language Resources and Evaluation Conference
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.
2019
pdf
bib
abs
A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance
Alexander Erdmann
|
Salam Khalifa
|
Mai Oudah
|
Nizar Habash
|
Houda Bouamor
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input. Our technique involves creating a small grammar of closed-class affixes which can be written in a few hours. The grammar over generates analyses for word forms attested in a raw corpus which are disambiguated based on features of the linguistic base proposed for each form. Extending the grammar to cover orthographic, morpho-syntactic or lexical variation is simple, making it an ideal solution for challenging corpora with noisy, dialect-inconsistent, or otherwise non-standard content. In two evaluations, we consistently outperform competitive unsupervised baselines and approach the performance of state-of-the-art supervised models trained on large amounts of data, providing evidence for the value of linguistic input during preprocessing.
pdf
bib
The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation
Mai Oudah
|
Amjad Almahairi
|
Nizar Habash
Proceedings of Machine Translation Summit XVII: Research Track
2012
pdf
bib
A Pipeline Arabic Named Entity Recognition using a Hybrid Approach
Mai Oudah
|
Khaled Shaalan
Proceedings of COLING 2012