Hassan Al-Haj


2010

pdf bib
Identifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy
Hassan Al-Haj | Shuly Wintner
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Michael Denkowski | Hassan Al-Haj | Alon Lavie
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf
The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation
Hassan Al-Haj | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

Morphologically rich languages pose a challenge for statistical machine translation (SMT). This challenge is magnified when translating into a morphologically rich language. In this work we address this challenge in the framework of a broad-coverage English-to-Arabic phrase based statistical machine translation (PBSMT). We explore the full spectrum of Arabic segmentation schemes ranging from full word form to fully segmented forms and examine the effects on system performance. Our results show a difference of 2.61 BLEU points between the best and worst segmentation schemes indicating that the choice of the segmentation scheme has a significant effect on the performance of a PBSMT system in a large data scenario. We also show that a simple segmentation scheme can perform as good as the best and more complicated segmentation scheme. We also report results on a wide set of techniques for recombining the segmented Arabic output.