Wen Hung Lee


2024

pdf
Automating True-False Multiple-Choice Question Generation and Evaluation with Retrieval-based Accuracy Differential
Chen-Jui Yu | Wen Hung Lee | Lin Tse Ke | Shih-Wei Guo | Yao-Chung Fan
Proceedings of the 17th International Natural Language Generation Conference

Creating high-quality True-False (TF) multiple-choice questions (MCQs), with accurate distractors, is a challenging and time-consuming task in education. This paper introduces True-False Distractor Generation (TFDG), a pipeline that leverages pre-trained language models and sentence retrieval techniques to automate the generation of TF-type MCQ distractors. Furthermore, the evaluation of generated TF questions presents a challenge. Traditional metrics like BLEU and ROUGE are unsuitable for this task. To address this, we propose a new evaluation metric called Retrieval-based Accuracy Differential (RAD). RAD assesses the discriminative power of TF questions by comparing model accuracy with and without access to reference texts. It quantitatively evaluates how well questions differentiate between students with varying knowledge levels. This research benefits educators and assessment developers, facilitating the efficient automatic generation of high-quality TF-type MCQs and their reliable evaluation.