Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments
Yunsung Kim, Michael Hardy, Joseph Tey, Candace Thille, Christopher J Piech
Abstract
AI-driven automated scoring systems offer scalable and efficient means of evaluating complex student-generated responses. Yet, despite increasing demand for transparency and interpretability, the field has yet to develop a widely accepted solution for interpretable automated scoring to be used in large-scale real-world assessments. This work takes a principled approach to address this challenge. We analyze the needs and potential benefits of interpretable automated scoring for various assessment stakeholder groups and develop four principles of interpretability – (F)aithfulness, (G)roundedness, (T)raceability, and (I)nterchangeability (FGTI) – targeted at those needs. To illustrate the feasibility of implementing these principles, we develop the AnalyticScore framework as a baseline reference framework. When applied to the domain of text-based constructed-response scoring, AnalyticScore outperforms many uninterpretable scoring methods in terms of scoring accuracy and is, on average, within 0.06 QWK of the uninterpretable SOTA across 10 items from the ASAP-SAS dataset. By comparing against human annotators conducting the same featurization task, we further demonstrate that the featurization behavior of AnalyticScore aligns well with that of humans.- Anthology ID:
- 2026.findings-acl.1859
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37313–37328
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1859/
- DOI:
- Cite (ACL):
- Yunsung Kim, Michael Hardy, Joseph Tey, Candace Thille, and Christopher J Piech. 2026. Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37313–37328, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments (Kim et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1859.pdf