Vote’n’Rank: Revision of Benchmarking with Social Choice Theory
Mark Rofin, Vladislav Mikhailov, Mikhail Florinsky, Andrey Kravchenko, Tatiana Shavrina, Elena Tutubalina, Daniel Karabekyan, Ekaterina Artemova
Abstract
The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives. Although the paradigm is shifting towards more fine-grained evaluation across diverse tasks, the delicate question of how to aggregate the performances has received particular interest in the community. In general, benchmarks follow the unspoken utilitarian principles, where the systems are ranked based on their mean average score over task-specific metrics. Such aggregation procedure has been viewed as a sub-optimal evaluation protocol, which may have created the illusion of progress. This paper proposes Vote’n’Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory. We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields and identify the best-performing systems in research and development case studies. The Vote’n’Rank’s procedures are more robust than the mean average while being able to handle missing performance scores and determine conditions under which the system becomes the winner.- Anthology ID:
- 2023.eacl-main.48
- Original:
- 2023.eacl-main.48v1
- Version 2:
- 2023.eacl-main.48v2
- Volume:
- Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Andreas Vlachos, Isabelle Augenstein
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 670–686
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2023.eacl-main.48/
- DOI:
- 10.18653/v1/2023.eacl-main.48
- Cite (ACL):
- Mark Rofin, Vladislav Mikhailov, Mikhail Florinsky, Andrey Kravchenko, Tatiana Shavrina, Elena Tutubalina, Daniel Karabekyan, and Ekaterina Artemova. 2023. Vote’n’Rank: Revision of Benchmarking with Social Choice Theory. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 670–686, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Vote’n’Rank: Revision of Benchmarking with Social Choice Theory (Rofin et al., EACL 2023)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2023.eacl-main.48.pdf