Comparing Traditional and LLM-based Approaches for Automated Scoring of Dutch Writing Products

Joni Kruijsbergen, Orphee De Clercq


Abstract
This research examines several traditional and recent approaches for automated grading of Dutch texts written by adolescent L1 speakers. We relied on a proprietary dataset comprising human-scored texts. Following recent paradigms in NLP research, we compared training a feature-based model to fine-tuning both mono- and multilingual BERT-based and generative large language models. The latter were also prompted directly in a zero-shot setting. The results reveal that the feature-based and BERT-based approaches are promising for the task at hand and even complementary, although there is still room for improvement. The error analysis demonstrates that the generative models do not only make more errors in classification, but that these error are also more problematic. We therefore conclude that especially generative LLMs are not directly employable in this educational context.
Anthology ID:
2026.lrec-main.44
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
619–630
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.44/
DOI:
Bibkey:
Cite (ACL):
Joni Kruijsbergen and Orphee De Clercq. 2026. Comparing Traditional and LLM-based Approaches for Automated Scoring of Dutch Writing Products. International Conference on Language Resources and Evaluation, main:619–630.
Cite (Informal):
Comparing Traditional and LLM-based Approaches for Automated Scoring of Dutch Writing Products (Kruijsbergen & De Clercq, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.44.pdf