Instruction-Tuned English to Bhojpuri Neural Machine Translation Using Contrastive Preference Optimization

Kshetrimayum Boynao Singh, Deepak Kumar, Asif Ekbal


Abstract
This paper presents an English to Bhojpuri machine translation (MT) system developed for the WMT25 General MT Shared Task. Given the low-resource nature of Bhojpuri, we adopt a two-stage training pipeline: unsupervised pretraining followed by supervised fine-tuning. During pretraining, we use a 300,000-sentence corpus comprising 70% Bhojpuri monolingual data and 30% English data to establish language grounding. The fine-tuning stage utilizes 29,749 bilingual English to Bhojpuri sentence pairs (including training, validation, and test sets). To adapt the system to instruction-following scenarios, we apply a novel optimization strategy: Contrastive Preference Optimization (CPO). This technique enables the model to capture fine-grained translation preferences and maintain semantic fidelity in instruction-tuned settings. We evaluate our system across multiple metrics, demonstrating moderate performance in low-resource MT tasks, particularly in diverse domains such as literary, news, social, and speech.
Anthology ID:
2025.wmt-1.38
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
638–643
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.38/
DOI:
Bibkey:
Cite (ACL):
Kshetrimayum Boynao Singh, Deepak Kumar, and Asif Ekbal. 2025. Instruction-Tuned English to Bhojpuri Neural Machine Translation Using Contrastive Preference Optimization. In Proceedings of the Tenth Conference on Machine Translation, pages 638–643, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Instruction-Tuned English to Bhojpuri Neural Machine Translation Using Contrastive Preference Optimization (Singh et al., WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.38.pdf