Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

Heejin Do; Sangwon Ryu; Gary Lee

doi:10.18653/v1/2024.emnlp-main.917

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

Abstract

Recent advances in automated essay scoring (AES) have shifted towards evaluating multiple traits to provide enriched feedback. Like typical AES systems, multi-trait AES employs the quadratic weighted kappa (QWK) to measure agreement with human raters, aligning closely with the rating schema; however, its non-differentiable nature prevents its direct use in neural network training. In this paper, we propose Scoring-aware Multi-reward Reinforcement Learning (SaMRL), which integrates actual evaluation schemes into the training process by designing QWK-based rewards with a mean-squared error penalty for multi-trait AES. Existing reinforcement learning (RL) applications in AES are limited to classification models despite associated performance degradation, as RL requires probability distributions; instead, we adopt an autoregressive score generation framework to leverage token generation probabilities for robust multi-trait score predictions. Empirical analyses demonstrate that SaMRL facilitates model training, notably enhancing scoring of previously inferior prompts.

Anthology ID:: 2024.emnlp-main.917
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16427–16438
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.917/
DOI:: 10.18653/v1/2024.emnlp-main.917
Bibkey:
Cite (ACL):: Heejin Do, Sangwon Ryu, and Gary Lee. 2024. Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 16427–16438, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards (Do et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.917.pdf

PDF Search Fix data