Context Is Ubiquitous, but Rarely Changes Judgments: Revisiting Document-Level MT Evaluation

Ahrii Kim

Context Is Ubiquitous, but Rarely Changes Judgments: Revisiting Document-Level MT Evaluation

Abstract

As sentence-level performance in modern Machine Translation (MT) has plateaued, reliable document-level evaluation is increasingly needed. While the recent FALCON framework with pragmatic features offers a promising direction, its reliability and reproducibility are unclear. We address this gap through human evaluation, analyzing sources of low inter-annotator agreement and identifying key factors. Based on these findings, we introduce H-FALCON, a Human-centered refinement of FALCON. Our experiments show that, even with limited annotator consensus, FALCON achieves correlations comparable to or better than standard sentence-level protocols.Furthermore, we find that contextual information is inherent in all sentences, challenging the view that only some require it. This suggests that prior estimates such as “n% of sentences require context” may stem from methodological artifacts. At the same time, we show that while context is pervasive, not all of it directly influences human judgment.

Anthology ID:: 2025.wmt-1.5
Volume:: Proceedings of the Tenth Conference on Machine Translation
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 81–97
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.5/
DOI:
Bibkey:
Cite (ACL):: Ahrii Kim. 2025. Context Is Ubiquitous, but Rarely Changes Judgments: Revisiting Document-Level MT Evaluation. In Proceedings of the Tenth Conference on Machine Translation, pages 81–97, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Context Is Ubiquitous, but Rarely Changes Judgments: Revisiting Document-Level MT Evaluation (Kim, WMT 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.5.pdf

PDF Cite Search Fix data