Self-Retrieval from Distant Contexts for Document-Level Machine Translation

Ziqian Peng; Rachel Bawden; François Yvon

Self-Retrieval from Distant Contexts for Document-Level Machine Translation

Ziqian Peng, Rachel Bawden, François Yvon

Abstract

Document-level machine translation is a challenging task, as it requires modeling both short-range and long-range dependencies to maintain the coherence and cohesion of the generated translation. However, these dependencies are sparse, and most context-augmented translation systems resort to two equally unsatisfactory options: either to include maximally long contexts, hoping that the useful dependencies are not lost in the noise; or to use limited local contexts, at the risk of missing relevant information. In this work, we study a self-retrieval-augmented machine translation framework (Self-RAMT), aimed at informing translation decisions with informative local and global contexts dynamically extracted from the source and target texts. We examine the effectiveness of this method using three large language models, considering three criteria for context selection. We carry out experiments on TED talks as well as parallel scientific articles, considering three translation directions. Our results show that integrating distant contexts with Self-RAMT improves translation quality as measured by reference-based scores and consistency metrics.

Anthology ID:: 2025.wmt-1.13
Volume:: Proceedings of the Tenth Conference on Machine Translation
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 220–240
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.13/
DOI:
Bibkey:
Cite (ACL):: Ziqian Peng, Rachel Bawden, and François Yvon. 2025. Self-Retrieval from Distant Contexts for Document-Level Machine Translation. In Proceedings of the Tenth Conference on Machine Translation, pages 220–240, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Self-Retrieval from Distant Contexts for Document-Level Machine Translation (Peng et al., WMT 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.13.pdf

PDF Cite Search Fix data