Rajesh Kumar Mundotiya


2020

pdf
Attention-based Domain Adaption Using Transfer Learning for Part-of-Speech Tagging: An Experiment on the Hindi language
Rajesh Kumar Mundotiya | Vikrant Kumar | Arpit Mehta | Anil Kumar Singh
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf
NLPRL Odia-English: Indic Language Neural Machine Translation System
Rupjyoti Baruah | Rajesh Kumar Mundotiya
Proceedings of the 7th Workshop on Asian Translation

In this manuscript, we (team name is NLPRL) describe systems description that was submitted to the translation shared tasks at WAT 2020. We describe our model as transformer based NMT by using byte-level based BPE (BBPE). We used the OdiEnCorp 2.0 parallel corpus provided by the shared task organizer where the training, validation, and test data contain 69370, 13544, and 14344 lines of parallel sentences, respectively. The evaluation results show the BLEU score of English-to-Oria below the Organizer (1.34) and Oria-to-English direction shows above the Organizer (11.33).

pdf
Transformer-based Neural Machine Translation System for Hindi – Marathi: WMT20 Shared Task
Amit Kumar | Rupjyoti Baruah | Rajesh Kumar Mundotiya | Anil Kumar Singh
Proceedings of the Fifth Conference on Machine Translation

This paper reports the results for the Machine Translation (MT) system submitted by the NLPRL team for the Hindi – Marathi Similar Translation Task at WMT 2020. We apply the Transformer-based Neural Machine Translation (NMT) approach on both translation directions for this language pair. The trained model is evaluated on the corpus provided by shared task organizers, using BLEU, RIBES, and TER scores. There were a total of 23 systems submitted for Marathi to Hindi and 21 systems submitted for Hindi to Marathi in the shared task. Out of these, our submission ranked 6th and 9th, respectively.

pdf
NLPRL System for Very Low Resource Supervised Machine Translation
Rupjyoti Baruah | Rajesh Kumar Mundotiya | Amit Kumar | Anil kumar Singh
Proceedings of the Fifth Conference on Machine Translation

This paper describes the results of the system that we used for the WMT20 very low resource (VLR) supervised MT shared task. For our experiments, we use a byte-level version of BPE, which requires a base vocabulary of size 256 only. BPE based models are a kind of sub-word models. Such models try to address the Out of Vocabulary (OOV) word problem by performing word segmentation so that segments correspond to morphological units. They are also reported to work across different languages, especially similar languages due to their sub-word nature. Based on BLEU cased score, our NLPRL systems ranked ninth for HSB to GER and tenth in GER to HSB translation scenario.

pdf
Unsupervised Approach for Zero-Shot Experiments: Bhojpuri–Hindi and Magahi–Hindi@LoResMT 2020
Amit Kumar | Rajesh Kumar Mundotiya | Anil Kumar Singh
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

This paper reports a Machine Translation (MT) system submitted by the NLPRL team for the Bhojpuri–Hindi and Magahi–Hindi language pairs at LoResMT 2020 shared task. We used an unsupervised domain adaptation approach that gives promising results for zero or extremely low resource languages. Task organizers provide the development and the test sets for evaluation and the monolingual data for training. Our approach is a hybrid approach of domain adaptation and back-translation. Metrics used to evaluate the trained model are BLEU, RIBES, Precision, Recall and F-measure. Our approach gives relatively promising results, with a wide range, of 19.5, 13.71, 2.54, and 3.16 BLEU points for Bhojpuri to Hindi, Magahi to Hindi, Hindi to Bhojpuri and Hindi to Magahi language pairs, respectively.

pdf
NLPRL at WNUT-2020 Task 2: ELMo-based System for Identification of COVID-19 Tweets
Rajesh Kumar Mundotiya | Rupjyoti Baruah | Bhavana Srivastava | Anil Kumar Singh
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

The Coronavirus pandemic has been a dominating news on social media for the last many months. Efforts are being made to reduce its spread and reduce the casualties as well as new infections. For this purpose, the information about the infected people and their related symptoms, as available on social media, such as Twitter, can help in prevention and taking precautions. This is an example of using noisy text processing for disaster management. This paper discusses the NLPRL results in Shared Task-2 of WNUT-2020 workshop. We have considered this problem as a binary classification problem and have used a pre-trained ELMo embedding with GRU units. This approach helps classify the tweets with accuracy as 80.85% and 78.54% as F1-score on the provided test dataset. The experimental code is available online.