Joseph Tey
2026
Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments
Yunsung Kim | Michael Hardy | Joseph Tey | Candace Thille | Christopher J Piech
Findings of the Association for Computational Linguistics: ACL 2026
Yunsung Kim | Michael Hardy | Joseph Tey | Candace Thille | Christopher J Piech
Findings of the Association for Computational Linguistics: ACL 2026
AI-driven automated scoring systems offer scalable and efficient means of evaluating complex student-generated responses. Yet, despite increasing demand for transparency and interpretability, the field has yet to develop a widely accepted solution for interpretable automated scoring to be used in large-scale real-world assessments. This work takes a principled approach to address this challenge. We analyze the needs and potential benefits of interpretable automated scoring for various assessment stakeholder groups and develop four principles of interpretability – (F)aithfulness, (G)roundedness, (T)raceability, and (I)nterchangeability (FGTI) – targeted at those needs. To illustrate the feasibility of implementing these principles, we develop the AnalyticScore framework as a baseline reference framework. When applied to the domain of text-based constructed-response scoring, AnalyticScore outperforms many uninterpretable scoring methods in terms of scoring accuracy and is, on average, within 0.06 QWK of the uninterpretable SOTA across 10 items from the ASAP-SAS dataset. By comparing against human annotators conducting the same featurization task, we further demonstrate that the featurization behavior of AnalyticScore aligns well with that of humans.