A3-108 Machine Translation System for Similar Language Translation Shared Task 2020

Saumitra Yadav, Manish Shrivastava


Abstract
In this paper, we describe our submissions for Similar Language Translation Shared Task 2020. We built 12 systems in each direction for Hindi⇐⇒Marathi language pair. This paper outlines initial baseline experiments with various tokenization schemes to train statistical models. Using optimal tokenization scheme among these we created synthetic source side text with back translation. And prune synthetic text with language model scores. This synthetic data was then used along with training data in various settings to build translation models. We also report configuration of the submitted systems and results produced by them.
Anthology ID:
2020.wmt-1.55
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
451–455
Language:
URL:
https://aclanthology.org/2020.wmt-1.55
DOI:
Bibkey:
Cite (ACL):
Saumitra Yadav and Manish Shrivastava. 2020. A3-108 Machine Translation System for Similar Language Translation Shared Task 2020. In Proceedings of the Fifth Conference on Machine Translation, pages 451–455, Online. Association for Computational Linguistics.
Cite (Informal):
A3-108 Machine Translation System for Similar Language Translation Shared Task 2020 (Yadav & Shrivastava, WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.wmt-1.55.pdf
Video:
 https://slideslive.com/38939590