Amulya Ratna Dash

Also published as: Amulya Ratna Dash


2026

Recent advances in Multimodal Machine Translation (MMT) have attempted to address ambiguity and polysemy in text alone by enabling models to draw additional contextual cues from paired images, thereby improving disambiguation and translation accuracy. Datasets such as Multi30K and Visual Genome have significantly advanced this line of research. However, these datasets do not always compel models to rely on visual information. The CoMMuTE dataset takes a stronger step in this direction by serving as an evaluation benchmark specifically designed around ambiguous English sentences that can only be correctly interpreted with their accompanying images. In this work, we extend CoMMuTE to two Indic languages, introducing IndicCoMMuTE — an evaluation dataset for assessing MMT systems on low-resource Indic languages. We benchmark a range of open-source multimodal Large Language Models (< 15B parameters) and a strong text-only baseline across eight languages. We fine-tune one of these LLMs on two Indic languages. Our findings provide insights into the strengths and limitations of LLMs and establish IndicCoMMuTE as a valuable benchmark for future research on Multimodal Machine Translation in Indic languages.

2024

The paper describes our submission for the unconstrained track of ‘Dialectal and Low-Resource Task’ proposed in IWSLT-2024. We designed cascaded Speech Translation systems for the language pairs Marathi-Hindi and Bhojpuri-Hindi utilising and fine-tuning different pre-trained models for carrying out Automatic Speech Recognition (ASR) and Machine Translation (MT).

2022

Code-Mixed text data consists of sentences having words or phrases from more than one language. Most multi-lingual communities worldwide communicate using multiple languages, with English usually one of them. Hinglish is a Code-Mixed text composed of Hindi and English but written in Roman script. This paper aims to determine the factors influencing the quality of Code-Mixed text data generated by the system. For the HinglishEval task, the proposed model uses multilingual BERT to find the similarity between synthetically generated and human-generated sentences to predict the quality of synthetically generated Hinglish sentences.

2021

This paper provides the description of shared tasks to the WAT 2021 by our team “NLPHut”. We have participated in the English→Hindi Multimodal translation task, English→Malayalam Multimodal translation task, and Indic Multi-lingual translation task. We have used the state-of-the-art Transformer model with language tags in different settings for the translation task and proposed a novel “region-specific” caption generation approach using a combination of image CNN and LSTM for the Hindi and Malayalam image captioning. Our submission tops in English→Malayalam Multimodal translation task (text-only translation, and Malayalam caption), and ranks second-best in English→Hindi Multimodal translation task (text-only translation, and Hindi caption). Our submissions have also performed well in the Indic Multilingual translation tasks.

2020

This paper describes the ODIANLP submission to WAT 2020. We have participated in the English-Hindi Multimodal task and Indic task. We have used the state-of-the-art Transformer model for the translation task and InceptionResNetV2 for the Hindi Image Captioning task. Our submission tops in English->Hindi Multimodal task in its track and Odia<->English translation tasks. Also, our submissions performed well in the Indic Multilingual tasks.