D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models

Grace Byun; Jinho D. Choi

D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models

Abstract

Evaluating generative models with open-ended generation is challenging due to inconsistencies in response formats. Multiple-choice (MC) evaluation mitigates this issue, but generating high-quality distractors is time-consuming and labor-intensive. We introduce D-GEN, the first open-source distractor generator model that transforms open-ended data into an MC format. To evaluate distractor quality, we propose two novel methods: 1) ranking alignment, ensuring generated distractors retain the discriminatory power of ground-truth distractors, and 2) entropy analysis, comparing model confidence distributions. Our results show that D-GEN preserves ranking consistency (Spearman’s 𝜌 0.99, Kendall’s 𝜏 0.94) and closely matches the entropy distribution of ground-truth distractors. Human evaluation further confirms the fluency, coherence, distractiveness, and incorrectness. Our work advances robust and efficient distractor generation with automated evaluation, setting a new standard for MC evaluation.

Anthology ID:: 2025.findings-acl.174
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3316–3349
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.174/
DOI:
Bibkey:
Cite (ACL):: Grace Byun and Jinho D. Choi. 2025. D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3316–3349, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models (Byun & Choi, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.174.pdf

PDF Cite Search Fix data