Long-context Reference-based MT Quality Estimation

Sami Haq, Chinonso Osuji, Sheila Castilho, Brian Davis, Thiago Castro Ferreira


Abstract
In this paper, we present our submission to the Tenth Conference on Machine Translation (WMT25) Shared Task on Automated Translation Quality Evaluation. Our systems are built upon the COMET framework and trained to predict segment-level ESA scores using augmented long-context data. To construct long-context training examples, we concatenate multiple in-domain sentences and compute a weighted average of their scores. We further integrate human judgment datasets MQM, SQM, and DA) through score normalisation and train multilingual models on the source, hypothesis, and reference translations. Experimental results demonstrate that incorporating long-context information yields higher correlations with human judgments compared to models trained exclusively on short segments.
Anthology ID:
2025.wmt-1.64
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
905–912
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.64/
DOI:
Bibkey:
Cite (ACL):
Sami Haq, Chinonso Osuji, Sheila Castilho, Brian Davis, and Thiago Castro Ferreira. 2025. Long-context Reference-based MT Quality Estimation. In Proceedings of the Tenth Conference on Machine Translation, pages 905–912, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Long-context Reference-based MT Quality Estimation (Haq et al., WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.64.pdf