Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals

Yupei Wang; Renfen Hu (胡韧奋); Zhe Zhao

doi:10.18653/v1/2024.findings-emnlp.520

Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals

Abstract

While current Automated Essay Scoring (AES) methods demonstrate high scoring agreement with human raters, their decision-making mechanisms are not fully understood. Our proposed method, using counterfactual intervention assisted by Large Language Models (LLMs), reveals that BERT-like models primarily focus on sentence-level features, whereas LLMs such as GPT-3.5, GPT-4 and Llama-3 are sensitive to conventions & accuracy, language complexity, and organization, indicating a more comprehensive rationale alignment with scoring rubrics. Moreover, LLMs can discern counterfactual interventions when giving feedback on essays. Our approach improves understanding of neural AES methods and can also apply to other domains seeking transparency in model-driven decisions.

Anthology ID:: 2024.findings-emnlp.520
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8906–8925
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.520/
DOI:: 10.18653/v1/2024.findings-emnlp.520
Bibkey:
Cite (ACL):: Yupei Wang, Renfen Hu, and Zhe Zhao. 2024. Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8906–8925, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals (Wang et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.520.pdf
Data:: 2024.findings-emnlp.520.data.zip

PDF Cite Search Data Fix data