Maria Zimina-Poirot
2025
Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task
Lichao Zhu
|
Maria Zimina-Poirot
|
Stephane Patin
|
Cristian Valdez
Proceedings of the Tenth Conference on Machine Translation
This paper describes our participation in the WMT25 Terminology Shared Task, specifically Track 1 (Spanish to English) focused on translation within the Information Technology (IT) domain. The shared task challenges participants to improve machine translation systems by effectively incorporating terminology constraints to ensure accurate and consistent translation of specialised technical terms. We experimented with several approaches to tackle terminology and lexical constraints with both NMT systems and LLMs with a small amount of training data and a glossary. Experimental results demonstrate that systems behave differently with and without glossary. The NMT system seems rather limited in adapting to special lexicon and resizing embeddings, which is the opposite of the case with LLMs preferring structured instructions. Through this participation, our objective is to improve terminology accuracy and overall translation quality, highlight the potential of specialised terminology-aware translation models for technical domains, and explore possibilities of fine-tuning of LLMs and NMT models with domain and lexical constraints.
2021
The SPECTRANS System Description for the WMT21 Terminology Task
Nicolas Ballier
|
Dahn Cho
|
Bilal Faye
|
Zong-You Ke
|
Hanna Martikainen
|
Mojca Pecman
|
Guillaume Wisniewski
|
Jean-Baptiste Yunès
|
Lichao Zhu
|
Maria Zimina-Poirot
Proceedings of the Sixth Conference on Machine Translation
This paper discusses the WMT 2021 terminology shared task from a “meta” perspective. We present the results of our experiments using the terminology dataset and the OpenNMT (Klein et al., 2017) and JoeyNMT (Kreutzer et al., 2019) toolkits for the language direction English to French. Our experiment 1 compares the predictions of the two toolkits. Experiment 2 uses OpenNMT to fine-tune the model. We report our results for the task with the evaluation script but mostly discuss the linguistic properties of the terminology dataset provided for the task. We provide evidence of the importance of text genres across scores, having replicated the evaluation scripts.
Search
Fix author
Co-authors
- Lichao Zhu 2
- Nicolas Ballier 1
- Dahn Cho 1
- Bilal Faye 1
- Zong-You Ke 1
- show all...
Venues
- wmt2