Evaluation and Large-scale Training for Contextual Machine Translation

Matt Post; Marcin Junczys-Dowmunt

doi:10.18653/v1/2024.wmt-1.112

Evaluation and Large-scale Training for Contextual Machine Translation

Abstract

Despite the fact that context is known to be vital for resolving a range of translation ambiguities, most traditional machine translation systems continue to be trained and to operate at the sentence level. A common explanation is the lack of document-level annotations for existing training data. This work investigates whether having such annotations would be helpful for training traditional MT systems at scale. We build large-scale, state-of-the-art contextual MT systems into German, French, and Russian, fixing the datasets while comparing the effect of sourcing contextual training samples from both parallel and back-translated data. We then evaluate these contextual models across a range of contextual test sets from the literature, where we find that (a) document annotations from both mined parallel and back-translated monolingual data are helpful, but that the best contextual MT systems do not draw contextual samples from the parallel data. We also make two points related to evaluation: (b) contrastive score-based metrics on challenge sets are not discriminative; instead, models must be tested directly on their ability to generate correct outputs, and (c) standard corpus-level metrics such as COMET work best in settings that are dense in contextual phenomena.

Anthology ID:: 2024.wmt-1.112
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venues:: WMT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1125–1139
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.wmt-1.112/
DOI:: 10.18653/v1/2024.wmt-1.112
Bibkey:
Cite (ACL):: Matt Post and Marcin Junczys-Dowmunt. 2024. Evaluation and Large-scale Training for Contextual Machine Translation. In Proceedings of the Ninth Conference on Machine Translation, pages 1125–1139, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Evaluation and Large-scale Training for Contextual Machine Translation (Post & Junczys-Dowmunt, WMT 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.wmt-1.112.pdf

PDF Cite Search Fix data