Jason Snyder
2026
Policy-Sensitive Fairness Evaluation in Automated Scoring of Clinical Communication
Saed Rezayi | Le An Ha | Victoria Yaneva | Polina Harik | Janet Mee | Jason Snyder
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Saed Rezayi | Le An Ha | Victoria Yaneva | Polina Harik | Janet Mee | Jason Snyder
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
This study examines automated scoring fairness in a formative assessment context: the automated evaluation of medical students’ communication skills. Building on the premise that definitions of fairness are value-dependent, we investigate how conclusions about group differences may vary under different weighting schemes for false positives (FPs) and false negatives (FNs). Results show that when errors are treated symmetrically, no statistically significant differences are observed across demographic groups based on race or gender. This pattern remains stable when error weights are varied, with no consistent or robust disparities emerging. A small number of isolated differences appear under moderate FN weighting. Overall, the findings suggest that fairness conclusions in this setting are relatively robust to variations in error weighting. At the same time, the study highlights the importance of making value assumptions explicit when evaluating automated scoring systems, particularly in formative contexts where error trade-offs carry pedagogical implications for feedback, learner engagement, and educational equity.