Rudali Huidrom


2022

pdf
Introducing EM-FT for Manipuri-English Neural Machine Translation
Rudali Huidrom | Yves Lepage
Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference

This paper introduces a pretrained word embedding for Manipuri, a low-resourced Indian language. The pretrained word embedding based on FastText is capable of handling the highly agglutinating language Manipuri (mni). We then perform machine translation (MT) experiments using neural network (NN) models. In this paper, we confirm the following observations. Firstly, the reported BLEU score of the Transformer architecture with FastText word embedding model EM-FT performs better than without in all the NMT experiments. Secondly, we observe that adding more training data from a different domain of the test data negatively impacts translation accuracy. The resources reported in this paper are made available in the ELRA catalogue to help the low-resourced languages community with MT/NLP tasks.

2021

pdf
EM Corpus: a comparable corpus for a less-resourced language pair Manipuri-English
Rudali Huidrom | Yves Lepage | Khogendra Khomdram
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)

In this paper, we introduce a sentence-level comparable text corpus crawled and created for the less-resourced language pair, Manipuri(mni) and English (eng). Our monolingual corpora comprise 1.88 million Manipuri sentences and 1.45 million English sentences, and our parallel corpus comprises 124,975 Manipuri-English sentence pairs. These data were crawled and collected over a year from August 2020 to March 2021 from a local newspaper website called ‘The Sangai Express.’ The resources reported in this paper are made available to help the low-resourced languages community for MT/NLP tasks.

2020

pdf
Zero-shot translation among Indian languages
Rudali Huidrom | Yves Lepage
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Standard neural machine translation (NMT) allows a model to perform translation between a pair of languages. Multilingual neural machine translation (NMT), on the other hand, allows a model to perform translation between several language pairs, even between language pairs for which no sentences pair has been seen during training (zero-shot translation). This paper presents experiments with zero-shot translation on low resource Indian languages with a very small amount of data for each language pair. We first report results on balanced data over all considered language pairs. We then expand our experiments for additional three rounds by increasing the training data with 2,000 sentence pairs in each round for some of the language pairs. We obtain an increase in translation accuracy with its balanced data settings score multiplied by 7 for Manipuri to Hindi during Round-III of zero-shot translation.