Addressing some Issues of Data Sparsity towards Improving English- Manipuri SMT using Morphological Information

Thoudam Doren Singh


Abstract
The performance of an SMT system heavily depends on the availability of large parallel corpora. Unavailability of these resources in the required amount for many language pair is a challenging issue. The required size of the resource involving morphologically rich and highly agglutinative language is essentially much more for the SMT systems. This paper investigates on some of the issues on enriching the resource for this kind of languages. Handling of inflectional and derivational morphemes of the morphologically rich target language plays important role in the enrichment process. Mapping from the source to the target side is carried out for the English-Manipuri SMT task using factored model. The SMT system developed shows improvement in the performance both in terms of the automatic scoring and subjective evaluation over the baseline system.
Anthology ID:
2012.amta-monomt.6
Volume:
Workshop on Monolingual Machine Translation
Month:
October 28-November 1
Year:
2012
Address:
San Diego, California, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2012.amta-monomt.6
DOI:
Bibkey:
Cite (ACL):
Thoudam Doren Singh. 2012. Addressing some Issues of Data Sparsity towards Improving English- Manipuri SMT using Morphological Information. In Workshop on Monolingual Machine Translation, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Addressing some Issues of Data Sparsity towards Improving English- Manipuri SMT using Morphological Information (Singh, AMTA 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-url/2012.amta-monomt.6.pdf