Investigating Low-resource Machine Translation for English-to-Tamil

Akshai Ramesh, Venkatesh Balavadhani parthasa, Rejwanul Haque, Andy Way


Abstract
Statistical machine translation (SMT) which was the dominant paradigm in machine translation (MT) research for nearly three decades has recently been superseded by the end-to-end deep learning approaches to MT. Although deep neural models produce state-of-the-art results in many translation tasks, they are found to under-perform on resource-poor scenarios. Despite some success, none of the present-day benchmarks that have tried to overcome this problem can be regarded as a universal solution to the problem of translation of many low-resource languages. In this work, we investigate the performance of phrase-based SMT (PB-SMT) and neural MT (NMT) on a rarely-tested low-resource language-pair, English-to-Tamil, taking a specialised data domain (software localisation) into consideration. In particular, we produce rankings of our MT systems via a social media platform-based human evaluation scheme, and demonstrate our findings in the low-resource domain-specific text translation task.
Anthology ID:
2020.loresmt-1.15
Volume:
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Month:
December
Year:
2020
Address:
Suzhou, China
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
118–125
Language:
URL:
https://aclanthology.org/2020.loresmt-1.15
DOI:
Bibkey:
Cite (ACL):
Akshai Ramesh, Venkatesh Balavadhani parthasa, Rejwanul Haque, and Andy Way. 2020. Investigating Low-resource Machine Translation for English-to-Tamil. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pages 118–125, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Investigating Low-resource Machine Translation for English-to-Tamil (Ramesh et al., LoResMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.loresmt-1.15.pdf