Policy-Sensitive Fairness Evaluation in Automated Scoring of Clinical Communication
Saed Rezayi, Le An Ha, Victoria Yaneva, Polina Harik, Janet Mee, Jason Snyder
Abstract
This study examines automated scoring fairness in a formative assessment context: the automated evaluation of medical students’ communication skills. Building on the premise that definitions of fairness are value-dependent, we investigate how conclusions about group differences may vary under different weighting schemes for false positives (FPs) and false negatives (FNs). Results show that when errors are treated symmetrically, no statistically significant differences are observed across demographic groups based on race or gender. This pattern remains stable when error weights are varied, with no consistent or robust disparities emerging. A small number of isolated differences appear under moderate FN weighting. Overall, the findings suggest that fairness conclusions in this setting are relatively robust to variations in error weighting. At the same time, the study highlights the importance of making value assumptions explicit when evaluating automated scoring systems, particularly in formative contexts where error trade-offs carry pedagogical implications for feedback, learner engagement, and educational equity.- Anthology ID:
- 2026.bea-1.40
- Volume:
- Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
- Venues:
- BEA | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 574–580
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.40/
- DOI:
- Cite (ACL):
- Saed Rezayi, Le An Ha, Victoria Yaneva, Polina Harik, Janet Mee, and Jason Snyder. 2026. Policy-Sensitive Fairness Evaluation in Automated Scoring of Clinical Communication. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 574–580, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- Policy-Sensitive Fairness Evaluation in Automated Scoring of Clinical Communication (Rezayi et al., BEA 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.40.pdf