Infosys Machine Translation System for WMT20 Similar Language Translation Task

Kamalkumar Rathinasamy, Amanpreet Singh, Balaguru Sivasambagupta, Prajna Prasad Neerchal, Vani Sivasankaran


Abstract
This paper describes Infosys’s submission to the WMT20 Similar Language Translation shared task. We participated in Indo-Aryan language pair in the language direction Hindi to Marathi. Our baseline system is byte-pair encoding based transformer model trained with the Fairseq sequence modeling toolkit. Our final system is an ensemble of two transformer models, which ranked first in WMT20 evaluation. One model is designed to learn the nuances of translation of this low resource language pair by taking advantage of the fact that the source and target languages are same alphabet languages. The other model is the result of experimentation with the proportion of back-translated data to the parallel data to improve translation fluency.
Anthology ID:
2020.wmt-1.52
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
437–441
Language:
URL:
https://aclanthology.org/2020.wmt-1.52
DOI:
Bibkey:
Cite (ACL):
Kamalkumar Rathinasamy, Amanpreet Singh, Balaguru Sivasambagupta, Prajna Prasad Neerchal, and Vani Sivasankaran. 2020. Infosys Machine Translation System for WMT20 Similar Language Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 437–441, Online. Association for Computational Linguistics.
Cite (Informal):
Infosys Machine Translation System for WMT20 Similar Language Translation Task (Rathinasamy et al., WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.wmt-1.52.pdf
Video:
 https://slideslive.com/38939615