WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging

Ahmed Elhady, Eneko Agirre, Mikel Artetxe


Abstract
We introduce WiCkeD, a simple method to increase the complexity of existing multiple-choice benchmarks by randomly replacing a choice with “None of the above”, a method often used in educational tests. We show that WiCkeD can be automatically applied to any existing benchmark, making it more challenging. We apply WiCkeD to 6 popular benchmarks and use it to evaluate 18 open-weight LLMs. The performance of the models drops12.1 points on average with respect to the original versions of the datasets. When using chainof-thought on 3 MMLU datasets, the performance drop for the WiCkeD variant is similar to the one observed when using the LLMs directly, showing that WiCkeD is also challenging for models with enhanced reasoning abilities. WiCkeD also uncovers that some models are more sensitive to the extra reasoning required, providing additional information with respect to the original benchmarks.We relase our code and data at github.com/anonymized.
Anthology ID:
2025.acl-short.94
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1183–1192
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.acl-short.94/
DOI:
Bibkey:
Cite (ACL):
Ahmed Elhady, Eneko Agirre, and Mikel Artetxe. 2025. WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1183–1192, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging (Elhady et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.acl-short.94.pdf