ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition

Hisham Abdullah Alyahya, Haidar Khan, Yazeed Alnumay, M Saiful Bari, Bulent Yener


Abstract
We introduce ZeroSumEval, a dynamic, competition-based, and evolving evaluation framework for Large Language Models (LLMs) that leverages competitive games. ZeroSumEval encompasses a diverse suite of games, including security challenges (Capture the Flag), classic board games (chess), and knowledge tests (MathQuiz). These games are designed to evaluate a range of capabilities such as strategic reasoning, planning, knowledge application, safety, and adaptability. Building upon recent studies that highlight the effectiveness of game-based evaluations for LLMs, ZeroSumEval enhances these approaches by providing a standardized and extensible framework for easily implementing games and leverages DSPy to provide a better abstraction for LLM player strategies.
Anthology ID:
2025.acl-demo.33
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Pushkar Mishra, Smaranda Muresan, Tao Yu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
340–350
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.33/
DOI:
Bibkey:
Cite (ACL):
Hisham Abdullah Alyahya, Haidar Khan, Yazeed Alnumay, M Saiful Bari, and Bulent Yener. 2025. ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 340–350, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition (Alyahya et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.33.pdf
Copyright agreement:
 2025.acl-demo.33.copyright_agreement.pdf