UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality

Di Wu; Christof Monz

UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality

Abstract

This year, we focus exclusively on using the uncertainty quantification as a proxy for translation quality. While this has traditionally been regarded as a form of unsupervised quality estimation, such signals have been overlooked in the design of the current metric models—we show their value in the context of LLMs. More specifically, in contrast to conventional unsupervised QE methods, we apply recent calibration technology to adjust translation likelihoods to better align with quality signals, and we use the single resulting model to participate in both the general translation and QE tracks at WMT25.Our offline experiments show some advantages: 1) uncertainty signals extracted from LLMs, like Tower or Gemma-3, provide accurate quality predictions; and 2) calibration technology further improves this QE performance, sometimes even surpassing certain metric models that were trained with human annotations, such as CometKiwi. We therefore argue that uncertainty quantification (confidence), especially from LLMs, can serve as a strong and complementary signal for the metric design, particularly when human-annotated data are lacking. However, we also identify limitations, such as its tendency to assign disproportionately higher scores to hypotheses generated by the model itself.

Anthology ID:: 2025.wmt-1.72
Volume:: Proceedings of the Tenth Conference on Machine Translation
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 974–983
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.72/
DOI:
Bibkey:
Cite (ACL):: Di Wu and Christof Monz. 2025. UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality. In Proceedings of the Tenth Conference on Machine Translation, pages 974–983, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality (Wu & Monz, WMT 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.72.pdf

PDF Cite Search Fix data