Abstract
Inflected languages in a low-resource setting present a data sparsity problem for statistical machine translation. In this paper, we present a minimally supervised algorithm for morpheme segmentation on Arabic dialects which reduces unknown words at translation time by over 50%, total vocabulary size by over 40%, and yields a significant increase in BLEU score over a previous state-of-the-art phrase-based statistical MT system.- Anthology ID:
- 2006.amta-papers.21
- Volume:
- Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
- Month:
- August 8-12
- Year:
- 2006
- Address:
- Cambridge, Massachusetts, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 185–192
- Language:
- URL:
- https://aclanthology.org/2006.amta-papers.21
- DOI:
- Cite (ACL):
- Jason Riesa and David Yarowsky. 2006. Minimally Supervised Morphological Segmentation with Applications to Machine Translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 185–192, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Minimally Supervised Morphological Segmentation with Applications to Machine Translation (Riesa & Yarowsky, AMTA 2006)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2006.amta-papers.21.pdf