GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models

Dylan Hutson; Daniel Vennemeyer; Aneesh Deshmukh; Justin Zhan; Tianyu Jiang

doi:10.18653/v1/2025.emnlp-main.876

GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models

Dylan Hutson, Daniel Vennemeyer, Aneesh Deshmukh, Justin Zhan, Tianyu Jiang

Abstract

We introduce GuessingGame, a protocol for evaluating large language models (LLMs) as strategic question-askers in open-ended, open-domain settings. A Guesser LLM identifies a hidden object by posing free-form questions to an Oracle—without predefined choices or candidate lists. To measure question quality, we propose two information gain (IG) metrics: a Bayesian method that tracks belief updates over semantic concepts using LLM-scored relevance, and an entropy-based method that filters candidates via ConceptNet. Both metrics are model-agnostic and support post hoc analysis. Across 858 games with multiple models and prompting strategies, higher IG strongly predicts efficiency: a one-standard-deviation IG increase reduces expected game length by 43%. Prompting constraints guided by IG—such as enforcing question diversity—enable weaker models to match GPT-4o. These results show that question-asking in LLMs is both measurable and improvable, and crucial for interactive reasoning.

Anthology ID:: 2025.emnlp-main.876
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17344–17360
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.876/
DOI:: 10.18653/v1/2025.emnlp-main.876
Bibkey:
Cite (ACL):: Dylan Hutson, Daniel Vennemeyer, Aneesh Deshmukh, Justin Zhan, and Tianyu Jiang. 2025. GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17344–17360, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models (Hutson et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.emnlp-main.876.pdf
Checklist:: 2025.emnlp-main.876.checklist.pdf

PDF Cite Search Checklist Fix data