DoDS-IITPKD:Submissions to the WMT25 Low-Resource Indic Language Translation Task

Ontiwell Khongthaw, G.l. Salvin, Shrikant Budde, Abigairl Chigwededza, Dhruvadeep Malkar, Swapnil Hingmire


Abstract
Low-resource translation for Indic languages poses significant challenges due to limited parallel corpora and linguistic diversity. In this work, we describe our participation in the WMT 2025 shared task for four Indic languages-Khasi, Mizo, Assamese, which is categorized into Category 1 and Bodo in Cate- gory 2. For our PRIMARY submission, we fine- tuned the distilled NLLB-200 model on bidi- rectional English↔Khasi and English↔Mizo data, and employed the IndicTrans2 model family for Assamese and Bodo translation. Our CONTRASTIVE submission augments training with external corpora from PMIN- DIA and Google SMOL to further enrich low- resource data coverage. Both systems lever- age Low-Rank Adaptation (LoRA) within a parameter-efficient fine-tuning framework, en- abling lightweight adapter training atop frozen pretrained weights. The translation pipeline was developed using the Hugging Face Trans- formers and PEFT libraries, augmented with bespoke preprocessing modules that append both language and domain identifiers to each instance. We evaluated our approach on par- allel corpora spanning multiple domains- ar- ticle based, newswire, scientific, and biblical texts as provided by the WMT25 dataset, under conditions of severe data scarcity. Fine-tuning lightweight LoRA adapters on targeted parallel corpora yields marked improvements in evalua- tion metrics, confirming their effectiveness for cross-domain adaptation in low-resource Indic languages.
Anthology ID:
2025.wmt-1.102
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1248–1252
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.102/
DOI:
Bibkey:
Cite (ACL):
Ontiwell Khongthaw, G.l. Salvin, Shrikant Budde, Abigairl Chigwededza, Dhruvadeep Malkar, and Swapnil Hingmire. 2025. DoDS-IITPKD:Submissions to the WMT25 Low-Resource Indic Language Translation Task. In Proceedings of the Tenth Conference on Machine Translation, pages 1248–1252, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
DoDS-IITPKD:Submissions to the WMT25 Low-Resource Indic Language Translation Task (Khongthaw et al., WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.102.pdf