Machine Translation for Low-Resource Languages through Monolingual Data and LLM: A Case Study of English-to-Basque

Nam Luu, Aitor Soroa, German Rigau, Ondřej Bojar


Abstract
Developing a machine translation (MT) system requires a considerable amount of high-quality parallel data, which is often limited for low-resource languages. This paper explores the use of synthetic data for training an LLM-based MT system in the English-to-Basque direction. Using Basque monolingual corpora as a starting point, we apply back-translation to generate parallel corpora, taking advantage of the fact that current LLMs do not translate well from English to Basque, but they yield an acceptable performance in the reverse direction. We conduct experiments in a multi-stage approach, from a simple Supervised Fine-tuning (SFT) step, to preference learning with the Direct Preference Optimization (DPO) technique. We then evaluate the approach with both automatic metrics and manual assessment. Experimental results suggest that for this task, SFT brings a clear improvement in translation quality, while DPO only yields marginal enhancement.
Anthology ID:
2026.eacl-srw.6
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
60–91
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.6/
DOI:
Bibkey:
Cite (ACL):
Nam Luu, Aitor Soroa, German Rigau, and Ondřej Bojar. 2026. Machine Translation for Low-Resource Languages through Monolingual Data and LLM: A Case Study of English-to-Basque. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 60–91, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Machine Translation for Low-Resource Languages through Monolingual Data and LLM: A Case Study of English-to-Basque (Luu et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.6.pdf