KozKreolMRU WMT 2025 CreoleMT System Description: Koz Kreol: Multi-Stage Training for English–Mauritian Creole MT

Yush Rajcoomar


Abstract
Mauritian Creole (Kreol Morisyen), spoken by approximately 1.5 million people worldwide, faces significant challenges in digital language technology due to limited computational resources. This paper presents “Koz Kreol”, a comprehensive approach to English–Mauritian Creole machine translation using a three-stage training methodology: monolingual pretraining, parallel data training, and LoRA fine-tuning. We achieve state-of-the-art results with a 28.82 BLEU score for EN→MFE translation, representing a 74% improvement over ChatGPT-4o. Our work addresses critical data scarcity through the use of existing datasets, synthetic data generation, and community-sourced translations. The methodology provides a replicable framework for other low-resource Creole languages while supporting digital inclusion and cultural preservation for the Mauritian community. This paper consists of both a systems and data subtask submission as part of a Creole MT Shared Task.
Anthology ID:
2025.wmt-1.92
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1183–1190
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.92/
DOI:
Bibkey:
Cite (ACL):
Yush Rajcoomar. 2025. KozKreolMRU WMT 2025 CreoleMT System Description: Koz Kreol: Multi-Stage Training for English–Mauritian Creole MT. In Proceedings of the Tenth Conference on Machine Translation, pages 1183–1190, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
KozKreolMRU WMT 2025 CreoleMT System Description: Koz Kreol: Multi-Stage Training for English–Mauritian Creole MT (Rajcoomar, WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.92.pdf