G.l. Salvin

2025

pdf bib abs
DoDS-IITPKD:Submissions to the WMT25 Low-Resource Indic Language Translation Task
Ontiwell Khongthaw | G.l. Salvin | Shrikant Budde | Abigairl Chigwededza | Dhruvadeep Malkar | Swapnil Hingmire
Proceedings of the Tenth Conference on Machine Translation

Low-resource translation for Indic languages poses significant challenges due to limited parallel corpora and linguistic diversity. In this work, we describe our participation in the WMT 2025 shared task for four Indic languages-Khasi, Mizo, Assamese, which is categorized into Category 1 and Bodo in Cate- gory 2. For our PRIMARY submission, we fine- tuned the distilled NLLB-200 model on bidi- rectional English↔Khasi and English↔Mizo data, and employed the IndicTrans2 model family for Assamese and Bodo translation. Our CONTRASTIVE submission augments training with external corpora from PMIN- DIA and Google SMOL to further enrich low- resource data coverage. Both systems lever- age Low-Rank Adaptation (LoRA) within a parameter-efficient fine-tuning framework, en- abling lightweight adapter training atop frozen pretrained weights. The translation pipeline was developed using the Hugging Face Trans- formers and PEFT libraries, augmented with bespoke preprocessing modules that append both language and domain identifiers to each instance. We evaluated our approach on par- allel corpora spanning multiple domains- ar- ticle based, newswire, scientific, and biblical texts as provided by the WMT25 dataset, under conditions of severe data scarcity. Fine-tuning lightweight LoRA adapters on targeted parallel corpora yields marked improvements in evalua- tion metrics, confirming their effectiveness for cross-domain adaptation in low-resource Indic languages.

Co-authors

Venues

wmt1

Fix author