Improving LLMs for Machine Translation Using Synthetic Preference Data

Dario Vajda, Domen Vreš, Marko Robnik Šikonja


Abstract
Large language models have emerged as effective machine translation systems. In this paper, we explore how a general instruction-tuned large language model can be improved for machine translation using relatively few easily produced data resources. Using Slovene as a use case, we improve the GaMS-9B-Instruct model using Direct Preference Optimization (DPO) training on a programmatically curated and enhanced subset of a public dataset. As DPO requires pairs of quality-ranked instances, we generated its training dataset by translating English Wikipedia articles using two LLMs, GaMS-9B-Instruct and EuroLLM-9B-Instruct. We ranked the resulting translations based on heuristics coupled with automatic evaluation metrics such as COMET. The evaluation shows that our fine-tuned model outperforms both models involved in the dataset generation. In comparison to the baseline models, the fine-tuned model achieved a COMET score gain of around 0.04 and 0.02, respectively, on translating Wikipedia articles. It also more consistently avoids language and formatting errors.
Anthology ID:
2025.luhme-1.7
Volume:
Proceedings of the 2nd LUHME Workshop
Month:
October
Year:
2025
Address:
Bologna, Italy
Editors:
Henrique Lopes Cardoso, Rui Sousa-Silva, Maarit Koponen, Antonio Pareja-Lora
Venue:
LUHME
SIG:
Publisher:
LUHME
Note:
Pages:
67–73
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.luhme-1.7/
DOI:
Bibkey:
Cite (ACL):
Dario Vajda, Domen Vreš, and Marko Robnik Šikonja. 2025. Improving LLMs for Machine Translation Using Synthetic Preference Data. In Proceedings of the 2nd LUHME Workshop, pages 67–73, Bologna, Italy. LUHME.
Cite (Informal):
Improving LLMs for Machine Translation Using Synthetic Preference Data (Vajda et al., LUHME 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.luhme-1.7.pdf