BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

Taisiya Glushkova; Chrysoula Zerva; André F. T. Martins

BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

Taisiya Glushkova, Chrysoula Zerva, André F. T. Martins

Abstract

Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers. In contrast, traditional evaluation metrics such as BLEU or chrF, which measure lexical or character overlap between translation hypotheses and human references, have lower correlations with human judgements but are sensitive to such deviations. In this paper, we investigate several ways of combining the two approaches in order to increase robustness of state-of-the-art evaluation methods to translations with critical errors. We show that by using additional information during training, such as sentence-level features and word-level tags, the trained metrics improve their capability to penalize translations with specific troublesome phenomena, which leads to gains in correlations with humans and on the recent DEMETR benchmark on several language pairs.

Anthology ID:: 2023.eamt-1.6
Volume:: Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Month:: June
Year:: 2023
Address:: Tampere, Finland
Editors:: Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
Venue:: EAMT
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 47–58
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.eamt-1.6/
DOI:
Bibkey:
Cite (ACL):: Taisiya Glushkova, Chrysoula Zerva, and André F. T. Martins. 2023. BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 47–58, Tampere, Finland. European Association for Machine Translation.
Cite (Informal):: BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation (Glushkova et al., EAMT 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.eamt-1.6.pdf

PDF Cite Search Fix data