Cristian Valdez

2025

pdf bib abs
Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task
Lichao Zhu | Maria Zimina-Poirot | Stephane Patin | Cristian Valdez
Proceedings of the Tenth Conference on Machine Translation

This paper describes our participation in the WMT25 Terminology Shared Task, specifically Track 1 (Spanish to English) focused on translation within the Information Technology (IT) domain. The shared task challenges participants to improve machine translation systems by effectively incorporating terminology constraints to ensure accurate and consistent translation of specialised technical terms. We experimented with several approaches to tackle terminology and lexical constraints with both NMT systems and LLMs with a small amount of training data and a glossary. Experimental results demonstrate that systems behave differently with and without glossary. The NMT system seems rather limited in adapting to special lexicon and resizing embeddings, which is the opposite of the case with LLMs preferring structured instructions. Through this participation, our objective is to improve terminology accuracy and overall translation quality, highlight the potential of specialised terminology-aware translation models for technical domains, and explore possibilities of fine-tuning of LLMs and NMT models with domain and lexical constraints.

Co-authors

Venues

wmt1

Fix author