Amit Kumar

2023

pdf abs
ODA_SRIB at SemEval-2023 Task 9: A Multimodal Approach for Improved Intimacy Analysis
Priyanshu Kumar | Amit Kumar | Jiban Prakash | Prabhat Lamba | Irfan Abdul
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We experiment with XLM-Twitter and XLM-RoBERTa models to predict the intimacy scores in Tweets i.e. the extent to which a Tweet contains intimate content. We propose a Transformer-TabNet based multimodal architecture using text data and statistical features from the text, which performs better than the vanilla Transformer based model. We further experiment with Adversarial Weight Perturbation to make our models generalized and robust. The ensemble of four of our best models achieve an over-all Pearson Coefficient of 0.5893 on the test dataset.

2021

pdf abs
Gated Transformer for Robust De-noised Sequence-to-Sequence Modelling
Ayan Sengupta | Amit Kumar | Sourabh Kumar Bhattacharjee | Suman Roy
Findings of the Association for Computational Linguistics: EMNLP 2021

Robust sequence-to-sequence modelling is an essential task in the real world where the inputs are often noisy. Both user-generated and machine generated inputs contain various kinds of noises in the form of spelling mistakes, grammatical errors, character recognition errors, all of which impact downstream tasks and affect interpretability of texts. In this work, we devise a novel sequence-to-sequence architecture for detecting and correcting different real world and artificial noises (adversarial attacks) from English texts. Towards that we propose a modified Transformer-based encoder-decoder architecture that uses a gating mechanism to detect types of corrections required and accordingly corrects texts. Experimental results show that our gated architecture with pre-trained language models perform significantly better that the non-gated counterparts and other state-of-the-art error correction models in correcting spelling and grammatical errors. Extrinsic evaluation of our model on Machine Translation (MT) and Summarization tasks show the competitive performance of the model against other generative sequence-to-sequence models under noisy inputs.

2020

pdf abs
Unsupervised Approach for Zero-Shot Experiments: Bhojpuri–Hindi and Magahi–Hindi@LoResMT 2020
Amit Kumar | Rajesh Kumar Mundotiya | Anil Kumar Singh
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

This paper reports a Machine Translation (MT) system submitted by the NLPRL team for the Bhojpuri–Hindi and Magahi–Hindi language pairs at LoResMT 2020 shared task. We used an unsupervised domain adaptation approach that gives promising results for zero or extremely low resource languages. Task organizers provide the development and the test sets for evaluation and the monolingual data for training. Our approach is a hybrid approach of domain adaptation and back-translation. Metrics used to evaluate the trained model are BLEU, RIBES, Precision, Recall and F-measure. Our approach gives relatively promising results, with a wide range, of 19.5, 13.71, 2.54, and 3.16 BLEU points for Bhojpuri to Hindi, Magahi to Hindi, Hindi to Bhojpuri and Hindi to Magahi language pairs, respectively.

pdf abs
Transformer-based Neural Machine Translation System for Hindi – Marathi: WMT20 Shared Task
Amit Kumar | Rupjyoti Baruah | Rajesh Kumar Mundotiya | Anil Kumar Singh
Proceedings of the Fifth Conference on Machine Translation

This paper reports the results for the Machine Translation (MT) system submitted by the NLPRL team for the Hindi – Marathi Similar Translation Task at WMT 2020. We apply the Transformer-based Neural Machine Translation (NMT) approach on both translation directions for this language pair. The trained model is evaluated on the corpus provided by shared task organizers, using BLEU, RIBES, and TER scores. There were a total of 23 systems submitted for Marathi to Hindi and 21 systems submitted for Hindi to Marathi in the shared task. Out of these, our submission ranked 6th and 9th, respectively.

pdf abs
NLPRL System for Very Low Resource Supervised Machine Translation
Rupjyoti Baruah | Rajesh Kumar Mundotiya | Amit Kumar | Anil kumar Singh
Proceedings of the Fifth Conference on Machine Translation

This paper describes the results of the system that we used for the WMT20 very low resource (VLR) supervised MT shared task. For our experiments, we use a byte-level version of BPE, which requires a base vocabulary of size 256 only. BPE based models are a kind of sub-word models. Such models try to address the Out of Vocabulary (OOV) word problem by performing word segmentation so that segments correspond to morphological units. They are also reported to work across different languages, especially similar languages due to their sub-word nature. Based on BLEU cased score, our NLPRL systems ranked ninth for HSB to GER and tenth in GER to HSB translation scenario.

2019

pdf abs
NLPRL at WAT2019: Transformer-based Tamil – English Indic Task Neural Machine Translation System
Amit Kumar | Anil Kumar Singh
Proceedings of the 6th Workshop on Asian Translation

This paper describes the Machine Translation system for Tamil-English Indic Task organized at WAT 2019. We use Transformer- based architecture for Neural Machine Translation.