Arnab Kumar Maji


2025

pdf bib
Findings of WMT 2025 Shared Task on Low-resource Indic Languages Translation
Partha Pakray | Reddi Krishna | Santanu Pal | Advaitha Vetagiri | Sandeep Dash | Arnab Kumar Maji | Saralin A. Lyngdoh | Lenin Laitonjam | Anupam Jamatia | Koj Sambyo | Ajit Das | Riyanka Manna
Proceedings of the Tenth Conference on Machine Translation

This study proposes the results of the lowresource Indic language translation task organized in collaboration with the Tenth Conference on Machine Translation (WMT) 2025. In this workshop, participants were required to build and develop machine translation models for the seven language pairs, which were categorized into two categories. Category 1 is moderate training data available in languages i.e English–Assamese, English–Mizo, English-Khasi, English–Manipuri and English– Nyishi. Category 2 has very limited training data available in languages, i.e English–Bodo and English–Kokborok. This task leverages the enriched IndicNE-corp1.0 dataset, which consists of an extensive collection of parallel and monilingual corpora for north eastern Indic languages. The participant results were evaluated using automatic machine translation metrics, including BLEU, TER, ROUGE-L, ChrF, and METEOR. Along with those metrics, this year’s work also includes Cosine similarity for evaluation, which captures the semantic representation of the sentence to measure the performance and accuracy of the models. This work aims to promote innovation and advancements in low-resource Indic languages.

2024

pdf bib
Findings of WMT 2024 Shared Task on Low-Resource Indic Languages Translation
Partha Pakray | Santanu Pal | Advaitha Vetagiri | Reddi Krishna | Arnab Kumar Maji | Sandeep Dash | Lenin Laitonjam | Lyngdoh Sarah | Riyanka Manna
Proceedings of the Ninth Conference on Machine Translation

This paper presents the results of the low-resource Indic language translation task, organized in conjunction with the Ninth Conference on Machine Translation (WMT) 2024. In this edition, participants were challenged to develop machine translation models for four distinct language pairs: English–Assamese, English-Mizo, English-Khasi, and English-Manipuri. The task utilized the enriched IndicNE-Corp1.0 dataset, which includes an extensive collection of parallel and monolingual corpora for northeastern Indic languages. The evaluation was conducted through a comprehensive suite of automatic metrics—BLEU, TER, RIBES, METEOR, and ChrF—supplemented by meticulous human assessment to measure the translation systems’ performance and accuracy. This initiative aims to drive advancements in low-resource machine translation and make a substantial contribution to the growing body of knowledge in this dynamic field.