Abstract
We develop two new metrics that build on top of the COMET architecture. The main contribution is collecting a ten-times larger corpus of human judgements than COMET and investigating how to filter out problematic human judgements. We propose filtering human judgements where human reference is statistically worse than machine translation. Furthermore, we average scores of all equal segments evaluated multiple times. The results comparing automatic metrics on source-based DA and MQM-style human judgement show state-of-the-art performance on a system-level pair-wise system ranking. We release both of our metrics for public use.- Anthology ID:
- 2022.wmt-1.47
- Volume:
- Proceedings of the Seventh Conference on Machine Translation (WMT)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 541–548
- Language:
- URL:
- https://aclanthology.org/2022.wmt-1.47
- DOI:
- Cite (ACL):
- Tom Kocmi, Hitokazu Matsushita, and Christian Federmann. 2022. MS-COMET: More and Better Human Judgements Improve Metric Performance. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 541–548, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- MS-COMET: More and Better Human Judgements Improve Metric Performance (Kocmi et al., WMT 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.wmt-1.47.pdf