Abstract
In this second edition of the Metric Score Landscape Challenge (MSLC), we examine how automatic metrics for machine translation perform on a wide variety of machine translation output, ranging from very low quality systems to the types of high-quality systems submitted to the General MT shared task at WMT. We also explore metric results on specific types of data, such as empty strings, wrong- or mixed-language text, and more. We raise several alarms about inconsistencies in metric scores, some of which can be resolved by increasingly explicit instructions for metric use, while others highlight technical flaws.- Anthology ID:
- 2024.wmt-1.34
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 475–491
- Language:
- URL:
- https://aclanthology.org/2024.wmt-1.34
- DOI:
- 10.18653/v1/2024.wmt-1.34
- Cite (ACL):
- Rebecca Knowles, Samuel Larkin, and Chi-Kiu Lo. 2024. MSLC24: Further Challenges for Metrics on a Wide Landscape of Translation Quality. In Proceedings of the Ninth Conference on Machine Translation, pages 475–491, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- MSLC24: Further Challenges for Metrics on a Wide Landscape of Translation Quality (Knowles et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.wmt-1.34.pdf