Investigating Length Issues in Document-level Machine Translation

Ziqian Peng, Rachel Bawden, François Yvon


Abstract
Transformer architectures are increasingly effective at processing and generating very long chunks of texts, opening new perspectives for document-level machine translation (MT). In this work, we challenge the ability of MT systems to handle texts comprising up to several thousands of tokens. We design and implement a new approach designed to precisely measure the effect of length increments on MT outputs. Our experiments with two representative architectures unambiguously show that (a) translation performance decreases with the length of the input text; (b) the position of sentences within the document matters and translation quality is higher for sentences occurring earlier in a document. We further show that manipulating the distribution of document lengths and of positional embeddings only marginally mitigates such problems. Our results suggest that even though document-level MT is computationally feasible, it does not yet match the performance of sentence-based MT.
Anthology ID:
2025.mtsummit-1.3
Volume:
Proceedings of Machine Translation Summit XX: Volume 1
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Editors:
Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
Venue:
MTSummit
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
4–23
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.3/
DOI:
Bibkey:
Cite (ACL):
Ziqian Peng, Rachel Bawden, and François Yvon. 2025. Investigating Length Issues in Document-level Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 1, pages 4–23, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):
Investigating Length Issues in Document-level Machine Translation (Peng et al., MTSummit 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.3.pdf