Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task

Lichao Zhu, Maria Zimina-Poirot, Stephane Patin, Cristian Valdez


Abstract
This paper describes our participation in the WMT25 Terminology Shared Task, specifically Track 1 (Spanish to English) focused on translation within the Information Technology (IT) domain. The shared task challenges participants to improve machine translation systems by effectively incorporating terminology constraints to ensure accurate and consistent translation of specialised technical terms. We experimented with several approaches to tackle terminology and lexical constraints with both NMT systems and LLMs with a small amount of training data and a glossary. Experimental results demonstrate that systems behave differently with and without glossary. The NMT system seems rather limited in adapting to special lexicon and resizing embeddings, which is the opposite of the case with LLMs preferring structured instructions. Through this participation, our objective is to improve terminology accuracy and overall translation quality, highlight the potential of specialised terminology-aware translation models for technical domains, and explore possibilities of fine-tuning of LLMs and NMT models with domain and lexical constraints.
Anthology ID:
2025.wmt-1.108
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1284–1291
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.108/
DOI:
Bibkey:
Cite (ACL):
Lichao Zhu, Maria Zimina-Poirot, Stephane Patin, and Cristian Valdez. 2025. Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task. In Proceedings of the Tenth Conference on Machine Translation, pages 1284–1291, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task (Zhu et al., WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.108.pdf