Vani Sivasankaran


2020

pdf
Infosys Machine Translation System for WMT20 Similar Language Translation Task
Kamalkumar Rathinasamy | Amanpreet Singh | Balaguru Sivasambagupta | Prajna Prasad Neerchal | Vani Sivasankaran
Proceedings of the Fifth Conference on Machine Translation

This paper describes Infosys’s submission to the WMT20 Similar Language Translation shared task. We participated in Indo-Aryan language pair in the language direction Hindi to Marathi. Our baseline system is byte-pair encoding based transformer model trained with the Fairseq sequence modeling toolkit. Our final system is an ensemble of two transformer models, which ranked first in WMT20 evaluation. One model is designed to learn the nuances of translation of this low resource language pair by taking advantage of the fact that the source and target languages are same alphabet languages. The other model is the result of experimentation with the proportion of back-translated data to the parallel data to improve translation fluency.