Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory

Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura


Abstract
This study examines the effect of grammatical features in automatic essay scoring (AES). We use two kinds of grammatical features as input to an AES model: (1) grammatical items that writers used correctly in essays, and (2) the number of grammatical errors. Experimental results show that grammatical features improve the performance of AES models that predict the holistic scores of essays. Multi-task learning with the holistic and grammar scores, alongside using grammatical features, resulted in a larger improvement in model performance. We also show that a model using grammar abilities estimated using Item Response Theory (IRT) as the labels for the auxiliary task achieved comparable performance to when we used grammar scores assigned by human raters. In addition, we weight the grammatical features using IRT to consider the difficulty of grammatical items and writers’ grammar abilities. We found that weighting grammatical features with the difficulty led to further improvement in performance.
Anthology ID:
2024.bea-1.26
Volume:
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Ekaterina Kochmar, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
316–329
Language:
URL:
https://aclanthology.org/2024.bea-1.26
DOI:
Bibkey:
Cite (ACL):
Kosuke Doi, Katsuhito Sudoh, and Satoshi Nakamura. 2024. Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), pages 316–329, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory (Doi et al., BEA 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.bea-1.26.pdf