A Measure of the System Dependence of Automated Metrics

Pius Von Däniken, Jan Milan Deriu, Mark Cieliebak


Abstract
Automated metrics for Machine Translation have made significant progress, with the goal of replacing expensive and time-consuming human evaluations. These metrics are typically assessed by their correlation with human judgments, which captures the monotonic relationship between human and metric scores. However, we argue that it is equally important to ensure that metrics treat all systems fairly and consistently. In this paper, we introduce a method to evaluate this aspect.
Anthology ID:
2025.acl-short.8
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–99
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-short.8/
DOI:
Bibkey:
Cite (ACL):
Pius Von Däniken, Jan Milan Deriu, and Mark Cieliebak. 2025. A Measure of the System Dependence of Automated Metrics. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 87–99, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
A Measure of the System Dependence of Automated Metrics (Von Däniken et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-short.8.pdf