Loitongbam Sanayai Meetei

2021

pdf bib abs
Multiple Captions Embellished Multilingual Multi-Modal Neural Machine Translation
Salam Michael Singh | Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

Neural machine translation based on bilingual text with limited training data suffers from lexical diversity, which lowers the rare word translation accuracy and reduces the generalizability of the translation system. In this work, we utilise the multiple captions from the Multi-30K dataset to increase the lexical diversity aided with the cross-lingual transfer of information among the languages in a multilingual setup. In this multilingual and multimodal setting, the inclusion of the visual features boosts the translation quality by a significant margin. Empirical study affirms that our proposed multimodal approach achieves substantial gain in terms of the automatic score and shows robustness in handling the rare word translation in the pretext of English to/from Hindi and Telugu translation tasks.

pdf abs
Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)

Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Machine Translation (MMT) tasks, where the captions are short image descriptions, the news captions provide a more detailed description of the contents of the images. As a result, numerous named entities relating to specific persons, locations, etc., are found. In this paper, we acquire two monolingual news datasets reported in English and Hindi paired with the images to generate a synthetic English-Hindi parallel corpus. The parallel corpus is used to train the English-Hindi Neural Machine Translation (NMT) and an English-Hindi MMT system by incorporating the image feature paired with the corresponding parallel corpus. We also conduct a systematic analysis to evaluate the English-Hindi MT systems with 1) more synthetic data and 2) by adding back-translated data. Our finding shows improvement in terms of BLEU scores for both the NMT (+8.05) and MMT (+11.03) systems.

pdf abs
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting
Loitongbam Sanayai Meetei | Laishram Rahul | Alok Singh | Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In this paper, we report the experimental findings of building Speech-to-Text translation systems for Manipuri-English on low resource setting which is first of its kind in this language pair. For this purpose, a new dataset consisting of a Manipuri-English parallel corpus along with the corresponding audio version of the Manipuri text is built. Based on this dataset, a benchmark evaluation is reported for the Manipuri-English Speech-to-Text translation using two approaches: 1) a pipeline model consisting of ASR (Automatic Speech Recognition) and Machine translation, and 2) an end-to-end Speech-to-Text translation. Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) and Time delay neural network (TDNN) Acoustic models are used to build two different pipeline systems using a shared MT system. Experimental result shows that the TDNN model outperforms GMM-HMM model significantly by a margin of 2.53% WER. However, their evaluation of Speech-to-Text translation differs by a small margin of 0.1 BLEU. Both the pipeline translation models outperform the end-to-end translation model by a margin of 2.6 BLEU score.

pdf abs
On the Transferability of Massively Multilingual Pretrained Models in the Pretext of the Indo-Aryan and Tibeto-Burman Languages
Salam Michael Singh | Loitongbam Sanayai Meetei | Alok Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In recent times, machine translation models can learn to perform implicit bridging between language pairs never seen explicitly during training and showing that transfer learning helps for languages with constrained resources. This work investigates the low resource machine translation via transfer learning from multilingual pre-trained models i.e. mBART-50 and mT5-base in the pretext of Indo-Aryan (Assamese and Bengali) and Tibeto-Burman (Manipuri) languages via finetuning as a downstream task. Assamese and Manipuri were absent in the pretraining of both mBART-50 and the mT5 models. However, the experimental results attest that the finetuning from these pre-trained models surpasses the multilingual model trained from scratch.

pdf abs
An Efficient Keyframes Selection Based Framework for Video Captioning
Alok Singh | Loitongbam Sanayai Meetei | Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Describing a video is a challenging yet attractive task since it falls into the intersection of computer vision and natural language generation. The attention-based models have reported the best performance. However, all these models follow similar procedures, such as segmenting videos into chunks of frames or sampling frames at equal intervals for visual encoding. The process of segmenting video into chunks or sampling frames at equal intervals causes encoding of redundant visual information and requires additional computational cost since a video consists of a sequence of similar frames and suffers from inescapable noise such as uneven illumination, occlusion and motion effects. In this paper, a boundary-based keyframes selection approach for video description is proposed that allow the system to select a compact subset of keyframes to encode the visual information and generate a description for a video without much degradation. The proposed approach uses 3 4 frames per video and yields competitive performance over two benchmark datasets MSVD and MSR-VTT (in both English and Hindi).

2020

pdf abs
English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay | Mihaela Vela | Josef van Genabith
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

We present the first study on the post-editing (PE) effort required to build a parallel dataset for English-Manipuri and English-Mizo, in the context of a project on creating data for machine translation (MT). English source text from a local daily newspaper are machine translated into Manipuri and Mizo using PBSMT systems built in-house. A Computer Assisted Translation (CAT) tool is used to record the time, keystroke and other indicators to measure PE effort in terms of temporal and technical effort. A positive correlation between the technical effort and the number of function words is seen for English-Manipuri and English-Mizo but a negative correlation between the technical effort and the number of noun words for English-Mizo. However, average time spent per token in PE English-Mizo text is negatively correlated with the temporal effort. The main reason for these results are due to (i) English and Mizo using the same script, while Manipuri uses a different script and (ii) the agglutinative nature of Manipuri. Further, we check the impact of training a MT system in an incremental approach, by including the post-edited dataset as additional training data. The result shows an increase in HBLEU of up to 4.6 for English-Manipuri.

2019

pdf abs
WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay
Proceedings of the 6th Workshop on Asian Translation

A multimodal translation is a task of translating a source language to a target language with the help of a parallel text corpus paired with images that represent the contextual details of the text. In this paper, we carried out an extensive comparison to evaluate the benefits of using a multimodal approach on translating text in English to a low resource language, Hindi as a part of WAT2019 shared task. We carried out the translation of English to Hindi in three separate tasks with both the evaluation and challenge dataset. First, by using only the parallel text corpora, then through an image caption generation approach and, finally with the multimodal approach. Our experiment shows a significant improvement in the result with the multimodal approach than the other approach.

Co-authors

Josef van Genabith 1

Laishram Rahul 1