Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments

Yunsung Kim, Michael Hardy, Joseph Tey, Candace Thille, Christopher J Piech


Abstract
AI-driven automated scoring systems offer scalable and efficient means of evaluating complex student-generated responses. Yet, despite increasing demand for transparency and interpretability, the field has yet to develop a widely accepted solution for interpretable automated scoring to be used in large-scale real-world assessments. This work takes a principled approach to address this challenge. We analyze the needs and potential benefits of interpretable automated scoring for various assessment stakeholder groups and develop four principles of interpretability – (F)aithfulness, (G)roundedness, (T)raceability, and (I)nterchangeability (FGTI) – targeted at those needs. To illustrate the feasibility of implementing these principles, we develop the AnalyticScore framework as a baseline reference framework. When applied to the domain of text-based constructed-response scoring, AnalyticScore outperforms many uninterpretable scoring methods in terms of scoring accuracy and is, on average, within 0.06 QWK of the uninterpretable SOTA across 10 items from the ASAP-SAS dataset. By comparing against human annotators conducting the same featurization task, we further demonstrate that the featurization behavior of AnalyticScore aligns well with that of humans.
Anthology ID:
2026.findings-acl.1859
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37313–37328
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1859/
DOI:
Bibkey:
Cite (ACL):
Yunsung Kim, Michael Hardy, Joseph Tey, Candace Thille, and Christopher J Piech. 2026. Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37313–37328, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments (Kim et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1859.pdf
Checklist:
 2026.findings-acl.1859.checklist.pdf