Abstract
We formulate an original model for statistical machine translation (SMT) inspired by characteristics of the Arabic-English translation task. Our approach incorporates part-of-speech tags and linguistically motivated phrase chunks in a 2-level shallow syntactic model of reordering. We implement and evaluate this model, showing it to have advantageous properties and to be competitive with an existing SMT baseline. We also describe cross-categorial lexical translation coercion, an interesting component and side-effect of our approach. Finally, we discuss the novel implementation of decoding for this model which saves much development work by constructing finite-state machine (FSM) representations of translation probability distributions and using generic FSM operations for search. Algorithmic details, examples and results focus on Arabic, and the paper includes discussion on the issues and challenges of Arabic statistical machine translation.- Anthology ID:
- 2003.mtsummit-semit.11
- Volume:
- Workshop on Machine Translation for Semitic languages: issues and approaches
- Month:
- September 23-27
- Year:
- 2003
- Address:
- New Orleans, USA
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2003.mtsummit-semit.11
- DOI:
- Cite (ACL):
- Charles Schafer and David Yarowsky. 2003. A two-level syntax-based approach to Arabic-English statistical machine translation. In Workshop on Machine Translation for Semitic languages: issues and approaches, New Orleans, USA.
- Cite (Informal):
- A two-level syntax-based approach to Arabic-English statistical machine translation (Schafer & Yarowsky, MTSummit 2003)
- PDF:
- https://preview.aclanthology.org/landing_page/2003.mtsummit-semit.11.pdf