Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation

Nitika Mathur, Timothy Baldwin, Trevor Cohn


Abstract
Accurate, automatic evaluation of machine translation is critical for system tuning, and evaluating progress in the field. We proposed a simple unsupervised metric, and additional supervised metrics which rely on contextual word embeddings to encode the translation and reference sentences. We find that these models rival or surpass all existing metrics in the WMT 2017 sentence-level and system-level tracks, and our trained model has a substantially higher correlation with human judgements than all existing metrics on the WMT 2017 to-English sentence level dataset.
Anthology ID:
P19-1269
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2799–2808
Language:
URL:
https://aclanthology.org/P19-1269
DOI:
10.18653/v1/P19-1269
Bibkey:
Cite (ACL):
Nitika Mathur, Timothy Baldwin, and Trevor Cohn. 2019. Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2799–2808, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation (Mathur et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/P19-1269.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/P19-1269.mp4
Code
 nitikam/mteval-in-context
Data
WMT 2016