Do Deep Neural Nets Display Human-like Attention in Short Answer Scoring?

Zijie Zeng, Xinyu Li, Dragan Gasevic, Guanliang Chen


Abstract
Deep Learning (DL) techniques have been increasingly adopted for Automatic Text Scoring in education. However, these techniques often suffer from their inabilities to explain and justify how a prediction is made, which, unavoidably, decreases their trustworthiness and hinders educators from embracing them in practice. This study aimed to investigate whether (and to what extent) DL-based graders align with human graders regarding the important words they identify when marking short answer questions. To this end, we first conducted a user study to ask human graders to manually annotate important words in assessing answer quality and then measured the overlap between these human-annotated words and those identified by DL-based graders (i.e., those receiving large attention weights). Furthermore, we ran a randomized controlled experiment to explore the impact of highlighting important words detected by DL-based graders on human grading. The results showed that: (i) DL-based graders, to a certain degree, displayed alignment with human graders no matter whether DL-based graders and human graders agreed on the quality of an answer; and (ii) it is possible to facilitate human grading by highlighting those DL-detected important words, though further investigations are necessary to understand how human graders exploit such highlighted words.
Anthology ID:
2022.naacl-main.14
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–205
Language:
URL:
https://aclanthology.org/2022.naacl-main.14
DOI:
10.18653/v1/2022.naacl-main.14
Bibkey:
Cite (ACL):
Zijie Zeng, Xinyu Li, Dragan Gasevic, and Guanliang Chen. 2022. Do Deep Neural Nets Display Human-like Attention in Short Answer Scoring?. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 191–205, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Do Deep Neural Nets Display Human-like Attention in Short Answer Scoring? (Zeng et al., NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.naacl-main.14.pdf
Software:
 2022.naacl-main.14.software.zip
Video:
 https://preview.aclanthology.org/auto-file-uploads/2022.naacl-main.14.mp4