Abstract
Statistical Machine Translation (SMT) of highly inflected, low-resource languages suffers from the problem of low bitext availability, which is exacerbated by large inflectional paradigms. When translating into English, rich source inflections have a high chance of being poorly estimated or out-of-vocabulary (OOV). We present a source language-agnostic system for automatically constructing phrase pairs from foreign-language inflections and their morphological analyses using manually constructed datasets, including Wiktionary. We then demonstrate the utility of these phrase tables in improving translation into English from Finnish, Czech, and Turkish in simulated low-resource settings, finding substantial gains in translation quality. We report up to +2.58 BLEU in a simulated low-resource setting and +1.65 BLEU in a moderateresource setting. We release our morphologically-motivated translation models, with tens of thousands of inflections in each of 8 languages.- Anthology ID:
- 2016.amta-researchers.14
- Volume:
- Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track
- Month:
- October 28 - November 1
- Year:
- 2016
- Address:
- Austin, TX, USA
- Editors:
- Spence Green, Lane Schwartz
- Venue:
- AMTA
- SIG:
- Publisher:
- The Association for Machine Translation in the Americas
- Note:
- Pages:
- 177–190
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2016.amta-researchers.14/
- DOI:
- Cite (ACL):
- John Hewitt, Matt Post, and David Yarowsky. 2016. Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages. In Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track, pages 177–190, Austin, TX, USA. The Association for Machine Translation in the Americas.
- Cite (Informal):
- Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages (Hewitt et al., AMTA 2016)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2016.amta-researchers.14.pdf
- Code
- john-hewitt/morph16