@inproceedings{raina-etal-2023-assessing,
    title = "Assessing Distractors in Multiple-Choice Tests",
    author = "Raina, Vatsal  and
      Liusie, Adian  and
      Gales, Mark",
    editor = {Deutsch, Daniel  and
      Dror, Rotem  and
      Eger, Steffen  and
      Gao, Yang  and
      Leiter, Christoph  and
      Opitz, Juri  and
      R{\"u}ckl{\'e}, Andreas},
    booktitle = "Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems",
    month = nov,
    year = "2023",
    address = "Bali, Indonesia",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.eval4nlp-1.2/",
    doi = "10.18653/v1/2023.eval4nlp-1.2",
    pages = "12--22",
    abstract = "Multiple-choice tests are a common approach for assessing candidates' comprehension skills. Standard multiple-choice reading comprehension exams require candidates to select the correct answer option from a discrete set based on a question in relation to a contextual passage. For appropriate assessment, the distractor answer options must by definition be incorrect but plausible and diverse. However, generating good quality distractors satisfying these criteria is a challenging task for content creators. We propose automated assessment metrics for the quality of distractors in multiple-choice reading comprehension tests. Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options. We assess incorrectness using the classification ability of a binary multiple-choice reading comprehension system. Plausibility is assessed by considering the distractor confidence - the probability mass associated with the distractor options for a standard multi-class multiple-choice reading comprehension system. Diversity is assessed by pairwise comparison of an embedding-based equivalence metric between the distractors of a question. To further validate the plausibility metric we compare against candidate distributions over multiple-choice questions and agreement with a ChatGPT model{'}s interpretation of distractor plausibility and diversity."
}Markdown (Informal)
[Assessing Distractors in Multiple-Choice Tests](https://preview.aclanthology.org/ingest-emnlp/2023.eval4nlp-1.2/) (Raina et al., Eval4NLP 2023)
ACL
- Vatsal Raina, Adian Liusie, and Mark Gales. 2023. Assessing Distractors in Multiple-Choice Tests. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, pages 12–22, Bali, Indonesia. Association for Computational Linguistics.