Part Represents Whole: Improving the Evaluation of Machine Translation System Using Entropy Enhanced Metrics

Yilun Liu, Shimin Tao, Chang Su, Min Zhang, Yanqing Zhao, Hao Yang


Abstract
Machine translation (MT) metrics often experience poor correlations with human assessments. In terms of MT system evaluation, most metrics pay equal attentions to every sample in an evaluation set, while in human evaluation, difficult sentences often make candidate systems distinguishable via notable fluctuations in human scores, especially when systems are competitive. We find that samples with high entropy values, which though usually count less than 5%, tend to play a key role in MT evaluation: when the evaluation set is shrunk to only the high-entropy portion, correlations with human assessments are actually improved. Thus, in this paper, we propose a fast and unsupervised approach to enhance MT metrics using entropy, expanding the dimension of evaluation by introducing sentence-level difficulty. A translation hypothesis with a significantly high entropy value is considered difficult and receives a large weight in aggregation of system-level scores. Experimental results on five sub-tracks in the WMT19 Metrics shared tasks show that our proposed method significantly enhanced the performance of commonly-used MT metrics in terms of system-level correlations with human assessments, even outperforming existing SOTA metrics. In particular, all enhanced metrics exhibit overall stability in correlations with human assessments in circumstances where only competitive MT systems are included, while the corresponding vanilla metrics fail to correlate with human assessments.
Anthology ID:
2022.findings-aacl.28
Volume:
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Month:
November
Year:
2022
Address:
Online only
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
296–307
Language:
URL:
https://aclanthology.org/2022.findings-aacl.28
DOI:
Bibkey:
Cite (ACL):
Yilun Liu, Shimin Tao, Chang Su, Min Zhang, Yanqing Zhao, and Hao Yang. 2022. Part Represents Whole: Improving the Evaluation of Machine Translation System Using Entropy Enhanced Metrics. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 296–307, Online only. Association for Computational Linguistics.
Cite (Informal):
Part Represents Whole: Improving the Evaluation of Machine Translation System Using Entropy Enhanced Metrics (Liu et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-aacl.28.pdf