Do You Get the Hint? Benchmarking LLMs on the Board Game Concept

Ine Gevers, Walter Daelemans


Abstract
Large language models (LLMs) have achieved striking successes on many benchmarks, yet recent studies continue to expose fundamental weaknesses. In this paper, we introduce Concept, a simple word-guessing board game, as a benchmark for probing abductive reasoning. Our results show that this game, easily solved by humans (with a success rate of over 90%), is still very challenging for state-of-the-art LLMs (no model exceeds 40% success rate). Specifically, we observe that LLMs struggle with interpreting other players’ strategic intents, and with correcting initial hypotheses given sequential information updates. In addition, we extend the evaluation across multiple languages, and find that the LLM performance drops further in lower-resource languages (Dutch, French, and Spanish) compared to English.
Anthology ID:
2026.findings-acl.1219
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24358–24371
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1219/
DOI:
Bibkey:
Cite (ACL):
Ine Gevers and Walter Daelemans. 2026. Do You Get the Hint? Benchmarking LLMs on the Board Game Concept. In Findings of the Association for Computational Linguistics: ACL 2026, pages 24358–24371, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Do You Get the Hint? Benchmarking LLMs on the Board Game Concept (Gevers & Daelemans, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1219.pdf
Checklist:
 2026.findings-acl.1219.checklist.pdf