Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation

Colten DiIanni; Daniel Deutsch

Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation

Abstract

This paper introduces Pairwise Difference Pearson (PDP), a novel segment-level meta-evaluation metric for Machine Translation (MT) that addresses limitations in previous Pearson’s 𝜌-based and Kendall’s 𝜏-based meta-evaluation approaches. PDP is a correlation-based metric that utilizes pairwise differences rather than raw scores. It draws on information from all segments for a more robust understanding of score distributions and uses only pairwise differences to refine Global Pearson to intra-segment comparisons. Analysis on the WMT’24 shared task shows PDP properly ranks sentinel evaluation metrics and better aligns with human error weightings than acc_eq.

Anthology ID:: 2025.emnlp-main.1273
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25073–25081
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1273/
DOI:
Bibkey:
Cite (ACL):: Colten DiIanni and Daniel Deutsch. 2025. Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25073–25081, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation (DiIanni & Deutsch, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1273.pdf
Checklist:: 2025.emnlp-main.1273.checklist.pdf

PDF Cite Search Checklist Fix data