Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests

Manar Ali, Judith Sieker, Sina Zarrieß, Hendrik Buschmeier


Abstract
In human conversation, both interlocutors play an active role in maintaining mutual understanding. When listeners are uncertain about what speakers mean, for example, they can request clarification. It is an open question for language models whether they can assume a similar listener role, recognizing and expressing their own uncertainty through clarification. We argue that reference games are a suitable testbed to approach this question as they are controlled, self-contained, and make clarification needs explicit and measurable. To test this, we evaluate three vision-language models comparing a baseline reference resolution task to an experiment where the models are instructed to request clarification when uncertain. The results suggest that even in such simple tasks, models often struggle to recognize internal uncertainty and translate it into adequate clarification behavior. This demonstrates the value of reference games as testbeds for interaction qualities of (vision and) language models.
Anthology ID:
2026.gem-main.76
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
990–998
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.76/
DOI:
Bibkey:
Cite (ACL):
Manar Ali, Judith Sieker, Sina Zarrieß, and Hendrik Buschmeier. 2026. Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 990–998, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests (Ali et al., GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.76.pdf