Toward Automated Evaluation of AI-Generated Item Drafts in Clinical Assessment

Tazin Afrin; Le An Ha; Victoria Yaneva; Keelan Evanini; Steven Go; Kristine DeRuchie; Michael Heilig

Toward Automated Evaluation of AI-Generated Item Drafts in Clinical Assessment

Tazin Afrin, Le An Ha, Victoria Yaneva, Keelan Evanini, Steven Go, Kristine DeRuchie, Michael Heilig

Abstract

This study examines the classification of AI-generated clinical multiple-choice questions drafts as “helpful” or “non-helpful” starting points. Expert judgments were analyzed, and multiple classifiers were evaluated—including feature-based models, fine-tuned transformers, and few-shot prompting with GPT-4. Our findings highlight the challenges and considerations for evaluation methods of AI-generated items in clinical test development.

Anthology ID:: 2025.aimecon-main.19
Volume:: Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:: October
Year:: 2025
Address:: Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:: Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:: AIME-Con
SIG:
Publisher:: National Council on Measurement in Education (NCME)
Note:
Pages:: 172–182
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.19/
DOI:
Bibkey:
Cite (ACL):: Tazin Afrin, Le An Ha, Victoria Yaneva, Keelan Evanini, Steven Go, Kristine DeRuchie, and Michael Heilig. 2025. Toward Automated Evaluation of AI-Generated Item Drafts in Clinical Assessment. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 172–182, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):: Toward Automated Evaluation of AI-Generated Item Drafts in Clinical Assessment (Afrin et al., AIME-Con 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.19.pdf

PDF Cite Search Fix data