Robust MT Evaluation with Sentence-level Multilingual Augmentation

Duarte Alves, Ricardo Rei, Ana C Farinha, José G. C. de Souza, André F. T. Martins


Abstract
Automatic translations with critical errors may lead to misinterpretations and pose several risks for the user. As such, it is important that Machine Translation (MT) Evaluation systems are robust to these errors in order to increase the reliability and safety of Machine Translation systems. Here we introduce SMAUG a novel Sentence-level Multilingual AUGmentation approach for generating translations with critical errors and apply this approach to create a test set to evaluate the robustness of MT metrics to these errors. We show that current State-of-the-Art metrics are improving their capability to distinguish translations with and without critical errors and to penalize the first accordingly. We also show that metrics tend to struggle with errors related to named entities and numbers and that there is a high variance in the robustness of current methods to translations with critical errors.
Anthology ID:
2022.wmt-1.43
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
469–478
Language:
URL:
https://aclanthology.org/2022.wmt-1.43
DOI:
Bibkey:
Cite (ACL):
Duarte Alves, Ricardo Rei, Ana C Farinha, José G. C. de Souza, and André F. T. Martins. 2022. Robust MT Evaluation with Sentence-level Multilingual Augmentation. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 469–478, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Robust MT Evaluation with Sentence-level Multilingual Augmentation (Alves et al., WMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2022.wmt-1.43.pdf
Software:
 2022.wmt-1.43.software.zip