Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments

Yunsung Kim; Michael Hardy; Joseph Tey; Candace Thille; Christopher J Piech

Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments

Yunsung Kim, Michael Hardy, Joseph Tey, Candace Thille, Christopher J Piech

Abstract

AI-driven automated scoring systems offer scalable and efficient means of evaluating complex student-generated responses. Yet, despite increasing demand for transparency and interpretability, the field has yet to develop a widely accepted solution for interpretable automated scoring to be used in large-scale real-world assessments. This work takes a principled approach to address this challenge. We analyze the needs and potential benefits of interpretable automated scoring for various assessment stakeholder groups and develop four principles of interpretability – (F)aithfulness, (G)roundedness, (T)raceability, and (I)nterchangeability (FGTI) – targeted at those needs. To illustrate the feasibility of implementing these principles, we develop the AnalyticScore framework as a baseline reference framework. When applied to the domain of text-based constructed-response scoring, AnalyticScore outperforms many uninterpretable scoring methods in terms of scoring accuracy and is, on average, within 0.06 QWK of the uninterpretable SOTA across 10 items from the ASAP-SAS dataset. By comparing against human annotators conducting the same featurization task, we further demonstrate that the featurization behavior of AnalyticScore aligns well with that of humans.

Anthology ID:: 2026.findings-acl.1859
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37313–37328
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1859/
DOI:
Bibkey:
Cite (ACL):: Yunsung Kim, Michael Hardy, Joseph Tey, Candace Thille, and Christopher J Piech. 2026. Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37313–37328, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments (Kim et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1859.pdf
Checklist:: 2026.findings-acl.1859.checklist.pdf

PDF Cite Search Checklist Fix data