D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models

Grace Byun, Jinho D. Choi


Abstract
Evaluating generative models with open-ended generation is challenging due to inconsistencies in response formats. Multiple-choice (MC) evaluation mitigates this issue, but generating high-quality distractors is time-consuming and labor-intensive. We introduce D-GEN, the first open-source distractor generator model that transforms open-ended data into an MC format. To evaluate distractor quality, we propose two novel methods: 1) ranking alignment, ensuring generated distractors retain the discriminatory power of ground-truth distractors, and 2) entropy analysis, comparing model confidence distributions. Our results show that D-GEN preserves ranking consistency (Spearman’s 𝜌 0.99, Kendall’s 𝜏 0.94) and closely matches the entropy distribution of ground-truth distractors. Human evaluation further confirms the fluency, coherence, distractiveness, and incorrectness. Our work advances robust and efficient distractor generation with automated evaluation, setting a new standard for MC evaluation.
Anthology ID:
2025.findings-acl.174
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3316–3349
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.174/
DOI:
Bibkey:
Cite (ACL):
Grace Byun and Jinho D. Choi. 2025. D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3316–3349, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models (Byun & Choi, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.174.pdf