JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation
Priyobroto Acharya, Haranath Mondal, Dipanjan Saha, Dipankar Das, Sivaji Bandyopadhyay
Abstract
Low-resource Indic languages such as Assamese, Manipuri, Mizo, and Bodo face persistent challenges in NMT due to limited parallel data, diverse scripts, and complex morphology. We address these issues in the WMT $2025$ shared task by introducing a unified multilingual NMT framework that combines rigorous language-specific preprocessing with parameter-efficient adaptation of large-scale models. Our pipeline integrates the NLLB-$200$ and IndicTrans$2$ architectures, fine-tuned using LoRA and DoRA, reducing trainable parameters by over 90% without degrading translation quality. A comprehensive preprocessing suite, including Unicode normalization, semantic filtering, transliteration, and noise reduction, ensures high-quality inputs, while script-aware post-processing mitigates evaluation bias from orthographic mismatches. Experiments across English-Indic directions demonstrate that NLLB-$200$ achieves superior results for Assamese, Manipuri, and Mizo, whereas IndicTrans$2$ excels in English-Bodo. Evaluated using BLEU, chrF, METEOR, ROUGE-L, and TER, our approach yields consistent improvements over baselines, underscoring the effectiveness of combining efficient fine-tuning with linguistically informed preprocessing for low-resource Indic MT.- Anthology ID:
- 2025.wmt-1.95
- Volume:
- Proceedings of the Tenth Conference on Machine Translation
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1201–1209
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.wmt-1.95/
- DOI:
- 10.18653/v1/2025.wmt-1.95
- Cite (ACL):
- Priyobroto Acharya, Haranath Mondal, Dipanjan Saha, Dipankar Das, and Sivaji Bandyopadhyay. 2025. JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation. In Proceedings of the Tenth Conference on Machine Translation, pages 1201–1209, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation (Acharya et al., WMT 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.wmt-1.95.pdf