Kunze Li
2025
CrossQG: Improving Difficulty-Controllable Question Generation through Consistency Enhancement
Kunze Li
|
Yu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Automatically generating questions with controlled difficulty has great application value, especially in the field of education. Although large language models are capable of generating questions of various difficulty levels, the generated questions often fail to align with the given target difficulty. To mitigate this issue, we propose CrossQG, a novel question generation method that requires no tuning of generator parameters, yet significantly improves difficulty consistency. Specifically, CrossQG consists of two steps: (1) contrast enhancement, which leverages questions from different difficulty levels to enhance the base models’ understanding of the target difficulty, and (2) cross filtering, which compares generated questions across different difficulty levels and filters out those that do not meet the target difficulty. We evaluate CrossQG on three high-quality question answering datasets. Experimental results demonstrate that across multiple models, CrossQG significantly outperforms several mainstream methods, achieving superior consistency with target difficulty and improving question quality. Notably, without generator training, CrossQG surpasses supervised fine-tuning in various instances.
2024
Planning First, Question Second: An LLM-Guided Method for Controllable Question Generation
Kunze Li
|
Yu Zhang
Findings of the Association for Computational Linguistics: ACL 2024
In the field of education, for better assessment of students’ abilities, generated questions often need to meet experts’ requirements, indicating the need for controllable question generation (CQG). However, current CQG methods mainly focus on difficulty control, neglecting the control of question content and assessed abilities, which are also crucial in educational QG. In this paper, we propose an LLM-guided method PFQS (for Planning First, Question Second), which utilizes Llama 2 to generate an answer plan and then generates questions based on it. The plan not only includes candidate answers but also integrates LLM’s understanding and multiple requirements, which make question generation simple and controllable. We evaluate our approach on the FairytaleQA dataset, a well-structured QA dataset derived from child-friendly storybooks. In the dataset, the attribute label represents content control, while the local_or_sum and ex_or_im labels denote difficulty control. Experimental results demonstrate that our approach outperforms previous state-of-the-art results and achieves better consistency with requirements compared to prompt-based method. Further application of our method to Llama 2 and Mistral also leads to improved requirement consistency in a zero-shot setting.