Chinonso Osuji
2025
Long-context Reference-based MT Quality Estimation
Sami Haq
|
Chinonso Osuji
|
Sheila Castilho
|
Brian Davis
|
Thiago Castro Ferreira
Proceedings of the Tenth Conference on Machine Translation
In this paper, we present our submission to the Tenth Conference on Machine Translation (WMT25) Shared Task on Automated Translation Quality Evaluation. Our systems are built upon the COMET framework and trained to predict segment-level ESA scores using augmented long-context data. To construct long-context training examples, we concatenate multiple in-domain sentences and compute a weighted average of their scores. We further integrate human judgment datasets MQM, SQM, and DA) through score normalisation and train multilingual models on the source, hypothesis, and reference translations. Experimental results demonstrate that incorporating long-context information yields higher correlations with human judgments compared to models trained exclusively on short segments.