Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation

Colten DiIanni, Daniel Deutsch


Abstract
This paper introduces Pairwise Difference Pearson (PDP), a novel segment-level meta-evaluation metric for Machine Translation (MT) that addresses limitations in previous Pearson’s 𝜌-based and Kendall’s 𝜏-based meta-evaluation approaches. PDP is a correlation-based metric that utilizes pairwise differences rather than raw scores. It draws on information from all segments for a more robust understanding of score distributions and uses only pairwise differences to refine Global Pearson to intra-segment comparisons. Analysis on the WMT’24 shared task shows PDP properly ranks sentinel evaluation metrics and better aligns with human error weightings than acceq.
Anthology ID:
2025.emnlp-main.1273
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25073–25081
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1273/
DOI:
Bibkey:
Cite (ACL):
Colten DiIanni and Daniel Deutsch. 2025. Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25073–25081, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation (DiIanni & Deutsch, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1273.pdf
Checklist:
 2025.emnlp-main.1273.checklist.pdf