Learning and Evaluating Factual Clarification Question Generation Without Examples

Matthew Toles, Yukun Huang, Zhou Yu


Abstract
Real-world tasks such as giving legal or technical advice often depend on context that is initially missing at the outset. The ability to derive missing factual information by asking clarifying questions (ACQ) is an important element of real-life collaboration on such reasoning tasks. Although intent disambiguation has been heavily investigated, factual reasoning remains underexplored. To enable evaluation of factual domain clarification question generation, we present a new task that focuses on the ability to elicit missing information in multi-hop reasoning tasks. We observe that humans outperform GPT-4o by a large margin, while Llama 3 8B Instruct does not even beat the dummy baseline in some metrics. Finally, we find that by fine-tuning Llama 3 8B Instruct on its own generations filtered via rejection sampling, we can improve information recovery by 27.6% without using any manually labeled data.
Anthology ID:
2025.gem-1.15
Volume:
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:
July
Year:
2025
Address:
Vienna, Austria and virtual meeting
Editors:
Kaustubh Dhole, Miruna Clinciu
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
200–211
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.gem-1.15/
DOI:
Bibkey:
Cite (ACL):
Matthew Toles, Yukun Huang, and Zhou Yu. 2025. Learning and Evaluating Factual Clarification Question Generation Without Examples. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 200–211, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Learning and Evaluating Factual Clarification Question Generation Without Examples (Toles et al., GEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.gem-1.15.pdf