Ahmed El Kholy - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Ahmed El Kholy

Also published as: Ahmed El Kholy

2015

pdf
Morphological constraints for phrase pivot statistical machine translation
Ahmed El Kholy | Nizar Habash
Proceedings of Machine Translation Summit XV: Papers

2014

pdf
Alignment symmetrisation optimization targeting phrase pivot statistical machine translation
Ahmed El Kholy | Nizar Habash
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf abs
MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic
Arfath Pasha | Mohamed Al-Badrashiny | Mona Diab | Ahmed El Kholy | Ramy Eskander | Nizar Habash | Manoj Pooleery | Owen Rambow | Ryan Roth
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007). MADAMIRA improves upon the two systems with a more streamlined Java implementation that is more robust, portable, extensible, and is faster than its ancestors by more than an order of magnitude. We also discuss an online demo (see http://nlp.ldeo.columbia.edu/madamira/) that highlights these aspects.

2013

pdf
Orthographic and Morphological Processing for Persian-to-English Statistical Machine Translation
Mohammad Sadegh Rasooli | Ahmed El Kholy | Nizar Habash
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Selective Combination of Pivot and Direct Statistical Machine Translation Models
Ahmed El Kholy | Nizar Habash | Gregor Leusch | Evgeny Matusov | Hassan Sawaf
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
Ahmed El Kholy | Nizar Habash | Gregor Leusch | Evgeny Matusov | Hassan Sawaf
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf
Translate, Predict or Generate: Modeling Rich Morphology in Statistical Machine Translation
Ahmed El Kholy | Nizar Habash
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf
Rich Morphology Generation Using Statistical Machine Translation
Ahmed El Kholy | Nizar Habash
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

2011

pdf
Automatic Error Analysis for Morphologically Rich Languages
Ahmed El Kholy | Nizar Habash
Proceedings of Machine Translation Summit XIII: Papers

pdf
Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation
Yuval Marton | Ahmed El Kholy | Nizar Habash
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf abs
Orthographic and Morphological Processing for English-Arabic Statistical Machine Translation
Ahmed El Kholy | Nizar Habash
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Much of the work on Statistical Machine Translation (SMT) from morphologically rich languages has shown that morphological tokenization and orthographic normalization help improve SMT quality because of the sparsity reduction they contribute. In this paper, we study the effect of these processes on SMT when translating into a morphologically rich language, namely Arabic. We explore a space of tokenization schemes and normalization options. We only evaluate on detokenized and orthographically correct (enriched) output. Our results show that the best performing tokenization scheme is that of the Penn Arabic Treebank. Additionally, training on orthographically normalized (reduced) text then jointly enriching and detokenizing the output outperforms training on enriched text.

Co-authors

Mohamed Al-Badrashiny 1

Ramy Eskander 1

Manoj Pooleery 1

Mohammad Sadegh Rasooli 1

Venues

jeptalnrecital1