Generating Questions Under Discussion with Reinforcement Learning using Ranking and Scoring for Reward and Evaluation

Kelvin Han; Claire Gardent

Generating Questions Under Discussion with Reinforcement Learning using Ranking and Scoring for Reward and Evaluation

Abstract

There is growing research interest in Questions Under Discussion (QUD), a linguistic framework for representing discourse in the form of natural language question-answer pairs, which are more easily understandable and have been found useful in several applications. Our goal in this work is to improve on the quality of automatic QUD generation. As a way to sidestep the paucity of data currently, we propose a reinforcement learning-based approach using the Group Relative Policy Optimisation (GRPO) objective for LLM post-training on the task. To get there, we: (i) carefully investigated five promising methods for reference-free automatic QUD evaluation, (ii) proposed a novel prompting strategy, SCRS, involving ranking and scoring with structured outputs that enables QUD evaluation close to the human upperbound, (iii) leveraged findings from (i) with (ii) for the knowledge distillation from a very large LLM to obtain a more resource-efficient reward model, and which (iv) we then used in the GRPO post-training for 3B LLMs on the QUD generation task. Our QUD generators give overall higher-quality QUDs compared to the SOTA which is based on supervised fine-tuning; all of these are achieved using only three annotated exemplars in the few-shot prompting for evaluation, and without the use of any other annotated questions for training the QUD generators. Our code, models, and annotated examples can be found at https://github.com/hankelvin/grpo_qud_generation.

Anthology ID:: 2025.findings-ijcnlp.35
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 589–615
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.35/
DOI:
Bibkey:
Cite (ACL):: Kelvin Han and Claire Gardent. 2025. Generating Questions Under Discussion with Reinforcement Learning using Ranking and Scoring for Reward and Evaluation. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 589–615, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: Generating Questions Under Discussion with Reinforcement Learning using Ranking and Scoring for Reward and Evaluation (Han & Gardent, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.35.pdf

PDF Cite Search Fix data