Implementing and Evaluating Multi-source Retrieval-Augmented Translation

Tommi Nieminen, Jörg Tiedemann, Sami Virpioja


Abstract
In recent years, neural machine translation (NMT) systems have been integrated with external databases with the aim of improving machine translation (MT) quality and enforcing domain-specific terminology and other conventions in the MT output. Most of the work in incorporating external knowledge with NMT has concentrated on integrating a single source of information, usually either a terminology database or a translation memory. However, in real-life translation scenarios, all relevant knowledge sources should be used in parallel. In this article, we evaluate different methods of integrating external knowledge from multiple sources in a single NMT system. In addition to training single models trained to utilize multiple kinds of information, we also ensemble models that have been trained to utilize a single type of information. We evaluate our models against state-of-the-art LLMs using an extensive purpose-built English to Finnish test suite.
Anthology ID:
2025.wmt-1.20
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
327–339
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.20/
DOI:
Bibkey:
Cite (ACL):
Tommi Nieminen, Jörg Tiedemann, and Sami Virpioja. 2025. Implementing and Evaluating Multi-source Retrieval-Augmented Translation. In Proceedings of the Tenth Conference on Machine Translation, pages 327–339, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Implementing and Evaluating Multi-source Retrieval-Augmented Translation (Nieminen et al., WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.20.pdf