A two-level syntax-based approach to Arabic-English statistical machine translation

Charles Schafer, David Yarowsky


Abstract
We formulate an original model for statistical machine translation (SMT) inspired by characteristics of the Arabic-English translation task. Our approach incorporates part-of-speech tags and linguistically motivated phrase chunks in a 2-level shallow syntactic model of reordering. We implement and evaluate this model, showing it to have advantageous properties and to be competitive with an existing SMT baseline. We also describe cross-categorial lexical translation coercion, an interesting component and side-effect of our approach. Finally, we discuss the novel implementation of decoding for this model which saves much development work by constructing finite-state machine (FSM) representations of translation probability distributions and using generic FSM operations for search. Algorithmic details, examples and results focus on Arabic, and the paper includes discussion on the issues and challenges of Arabic statistical machine translation.
Anthology ID:
2003.mtsummit-semit.11
Volume:
Workshop on Machine Translation for Semitic languages: issues and approaches
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-semit.11
DOI:
Bibkey:
Cite (ACL):
Charles Schafer and David Yarowsky. 2003. A two-level syntax-based approach to Arabic-English statistical machine translation. In Workshop on Machine Translation for Semitic languages: issues and approaches, New Orleans, USA.
Cite (Informal):
A two-level syntax-based approach to Arabic-English statistical machine translation (Schafer & Yarowsky, MTSummit 2003)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2003.mtsummit-semit.11.pdf