EdinHelsOW WMT 2025 CreoleMT System Description: Improving Lusophone Creole Translation through Data Augmentation, Model Merging and LLM Post-editing
Jacqueline Rowe, Ona De Gibert, Mateusz Klimaszewski, Coleman Haley, Alexandra Birch, Yves Scherrer
Abstract
In this work, we present our submissions to the unconstrained track of the System subtask of the WMT 2025 Creole Language Translation Shared Task. Of the 52 Creole languages included in the task, we focus on translation between English and seven Lusophone Creoles. Our approach leverages known strategies for low-resource machine translation, including back-translation and distillation of data, fine-tuning pre-trained multilingual models, and post-editing with large language models and lexicons. We also demonstrate that adding high-quality parallel Portuguese data in training, initialising Creole embeddings with Portuguese embedding weights, and strategically merging best checkpoints of different fine-tuned models all produce considerable gains in performance in certain translation directions. Our best models outperform the baselines on the Task test set for eight out of fourteen translation directions. When evaluated on a more diverse test set, they surpass the baselines in all but one direction.- Anthology ID:
- 2025.wmt-1.91
- Volume:
- Proceedings of the Tenth Conference on Machine Translation
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1166–1182
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.91/
- DOI:
- Cite (ACL):
- Jacqueline Rowe, Ona De Gibert, Mateusz Klimaszewski, Coleman Haley, Alexandra Birch, and Yves Scherrer. 2025. EdinHelsOW WMT 2025 CreoleMT System Description: Improving Lusophone Creole Translation through Data Augmentation, Model Merging and LLM Post-editing. In Proceedings of the Tenth Conference on Machine Translation, pages 1166–1182, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- EdinHelsOW WMT 2025 CreoleMT System Description: Improving Lusophone Creole Translation through Data Augmentation, Model Merging and LLM Post-editing (Rowe et al., WMT 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.91.pdf