Corey Palermo


2025

pdf bib
Operational Alignment of Confidence-Based Flagging Methods in Automated Scoring
Corey Palermo | Troy Chen | Arianto Wibowo
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers

In hybrid scoring systems, confidence thresholds determine which responses receive human review. This study evaluates a relative (within-batch) thresholding method against an absolute benchmark across ten items. Results show near-perfect agreement and modest distributional differences, supporting the relative method’s validity as a scalable, operationally viable approach for flagging low-confidence responses.