Saumitra Yadav


2021

pdf
A3-108 Machine Translation System for LoResMT Shared Task @MT Summit 2021 Conference
Saumitra Yadav | Manish Shrivastava
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In this paper, we describe our submissions for LoResMT Shared Task @MT Summit 2021 Conference. We built statistical translation systems in each direction for English ⇐⇒ Marathi language pair. This paper outlines initial baseline experiments with various tokenization schemes to train models. Using optimal tokenization scheme we create synthetic data and further train augmented dataset to create more statistical models. Also, we reorder English to match Marathi syntax to further train another set of baseline and data augmented models using various tokenization schemes. We report configuration of the submitted systems and results produced by them.

pdf
A3-108 Machine Translation System for Similar Language Translation Shared Task 2021
Saumitra Yadav | Manish Shrivastava
Proceedings of the Sixth Conference on Machine Translation

In this paper, we describe our submissions for the Similar Language Translation Shared Task 2021. We built 3 systems in each direction for the Tamil ⇐⇒ Telugu language pair. This paper outlines experiments with various tokenization schemes to train statistical models. We also report the configuration of the submitted systems and results produced by them.

2020

pdf
A3-108 Machine Translation System for Similar Language Translation Shared Task 2020
Saumitra Yadav | Manish Shrivastava
Proceedings of the Fifth Conference on Machine Translation

In this paper, we describe our submissions for Similar Language Translation Shared Task 2020. We built 12 systems in each direction for Hindi⇐⇒Marathi language pair. This paper outlines initial baseline experiments with various tokenization schemes to train statistical models. Using optimal tokenization scheme among these we created synthetic source side text with back translation. And prune synthetic text with language model scores. This synthetic data was then used along with training data in various settings to build translation models. We also report configuration of the submitted systems and results produced by them.

2019

pdf
A3-108 Machine Translation System for LoResMT 2019
Saumitra Yadav | Vandan Mujadia | Manish Shrivastava
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages