Towards Improving Multimodal Machine Translation with LLMs: A Focus on Indic Languages

Amulya Ratna Dash, Chirag Wadhwa, Yashvardhan Sharma


Abstract
Recent advances in Multimodal Machine Translation (MMT) have attempted to address ambiguity and polysemy in text alone by enabling models to draw additional contextual cues from paired images, thereby improving disambiguation and translation accuracy. Datasets such as Multi30K and Visual Genome have significantly advanced this line of research. However, these datasets do not always compel models to rely on visual information. The CoMMuTE dataset takes a stronger step in this direction by serving as an evaluation benchmark specifically designed around ambiguous English sentences that can only be correctly interpreted with their accompanying images. In this work, we extend CoMMuTE to two Indic languages, introducing IndicCoMMuTE — an evaluation dataset for assessing MMT systems on low-resource Indic languages. We benchmark a range of open-source multimodal Large Language Models (< 15B parameters) and a strong text-only baseline across eight languages. We fine-tune one of these LLMs on two Indic languages. Our findings provide insights into the strengths and limitations of LLMs and establish IndicCoMMuTE as a valuable benchmark for future research on Multimodal Machine Translation in Indic languages.
Anthology ID:
2026.lrec-main.698
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
8872–8882
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.698/
DOI:
Bibkey:
Cite (ACL):
Amulya Ratna Dash, Chirag Wadhwa, and Yashvardhan Sharma. 2026. Towards Improving Multimodal Machine Translation with LLMs: A Focus on Indic Languages. International Conference on Language Resources and Evaluation, main:8872–8882.
Cite (Informal):
Towards Improving Multimodal Machine Translation with LLMs: A Focus on Indic Languages (Dash et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.698.pdf