Findings of WMT 2025 Shared Task on Low-resource Indic Languages Translation

Partha Pakray, Reddi Krishna, Santanu Pal, Advaitha Vetagiri, Sandeep Dash, Arnab Kumar Maji, Saralin A. Lyngdoh, Lenin Laitonjam, Anupam Jamatia, Koj Sambyo, Ajit Das, Riyanka Manna


Abstract
This study proposes the results of the lowresource Indic language translation task organized in collaboration with the Tenth Conference on Machine Translation (WMT) 2025. In this workshop, participants were required to build and develop machine translation models for the seven language pairs, which were categorized into two categories. Category 1 is moderate training data available in languages i.e English–Assamese, English–Mizo, English-Khasi, English–Manipuri and English– Nyishi. Category 2 has very limited training data available in languages, i.e English–Bodo and English–Kokborok. This task leverages the enriched IndicNE-corp1.0 dataset, which consists of an extensive collection of parallel and monilingual corpora for north eastern Indic languages. The participant results were evaluated using automatic machine translation metrics, including BLEU, TER, ROUGE-L, ChrF, and METEOR. Along with those metrics, this year’s work also includes Cosine similarity for evaluation, which captures the semantic representation of the sentence to measure the performance and accuracy of the models. This work aims to promote innovation and advancements in low-resource Indic languages.
Anthology ID:
2025.wmt-1.29
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
532–553
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.29/
DOI:
Bibkey:
Cite (ACL):
Partha Pakray, Reddi Krishna, Santanu Pal, Advaitha Vetagiri, Sandeep Dash, Arnab Kumar Maji, Saralin A. Lyngdoh, Lenin Laitonjam, Anupam Jamatia, Koj Sambyo, Ajit Das, and Riyanka Manna. 2025. Findings of WMT 2025 Shared Task on Low-resource Indic Languages Translation. In Proceedings of the Tenth Conference on Machine Translation, pages 532–553, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Findings of WMT 2025 Shared Task on Low-resource Indic Languages Translation (Pakray et al., WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.29.pdf