Lexical Resources to Enrich English Malayalam Machine Translation

Sreelekha S, Pushpak Bhattacharyya


Abstract
In this paper we present our work on the usage of lexical resources for the Machine Translation English and Malayalam. We describe a comparative performance between different Statistical Machine Translation (SMT) systems on top of phrase based SMT system as baseline. We explore different ways of utilizing lexical resources to improve the quality of English Malayalam statistical machine translation. In order to enrich the training corpus we have augmented the lexical resources in two ways (a) additional vocabulary and (b) inflected verbal forms. Lexical resources include IndoWordnet semantic relation set, lexical words and verb phrases etc. We have described case studies, evaluations and have given detailed error analysis for both Malayalam to English and English to Malayalam machine translation systems. We observed significant improvement in evaluations of translation quality. Lexical resources do help uplift performance when parallel corpora are scanty.
Anthology ID:
L16-1098
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
620–627
Language:
URL:
https://aclanthology.org/L16-1098
DOI:
Bibkey:
Cite (ACL):
Sreelekha S and Pushpak Bhattacharyya. 2016. Lexical Resources to Enrich English Malayalam Machine Translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 620–627, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Lexical Resources to Enrich English Malayalam Machine Translation (S & Bhattacharyya, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/L16-1098.pdf