Long-context Reference-based MT Quality Estimation
Sami Haq, Chinonso Osuji, Sheila Castilho, Brian Davis, Thiago Castro Ferreira
Abstract
In this paper, we present our submission to the Tenth Conference on Machine Translation (WMT25) Shared Task on Automated Translation Quality Evaluation. Our systems are built upon the COMET framework and trained to predict segment-level ESA scores using augmented long-context data. To construct long-context training examples, we concatenate multiple in-domain sentences and compute a weighted average of their scores. We further integrate human judgment datasets MQM, SQM, and DA) through score normalisation and train multilingual models on the source, hypothesis, and reference translations. Experimental results demonstrate that incorporating long-context information yields higher correlations with human judgments compared to models trained exclusively on short segments.- Anthology ID:
- 2025.wmt-1.64
- Volume:
- Proceedings of the Tenth Conference on Machine Translation
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 905–912
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.64/
- DOI:
- Cite (ACL):
- Sami Haq, Chinonso Osuji, Sheila Castilho, Brian Davis, and Thiago Castro Ferreira. 2025. Long-context Reference-based MT Quality Estimation. In Proceedings of the Tenth Conference on Machine Translation, pages 905–912, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Long-context Reference-based MT Quality Estimation (Haq et al., WMT 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.64.pdf