End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models

Kamel Nebhi; Amrita Panesar; Hans Bantilan

End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models

Kamel Nebhi, Amrita Panesar, Hans Bantilan

Abstract

Automated item generation (AIG) is a key enabler for scaling language proficiency assessments. We present an end-to-end methodology for automated generation, annotation, and integration of adaptive writing items for the EF Standard English Test (EFSET), leveraging recent advances in large language models (LLMs). Our pipeline uses few-shot prompting with state-of-the-art LLMs to generate diverse, proficiency-aligned prompts, rigorously validated by expert reviewers. For robust scoring, we construct a synthetic response dataset via majority-vote LLM annotation and fine-tune a LLaMA 3.1 (8B) model. For each writing item, a range of proficiency-aligned synthetic responses, designed to emulate authentic student work, are produced for model training and evaluation. These results demonstrate substantial gains in scalability and validity, offering a replicable framework for next-generation adaptive language testing.

Anthology ID:: 2025.bea-1.73
Volume:: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 968–977
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.73/
DOI:
Bibkey:
Cite (ACL):: Kamel Nebhi, Amrita Panesar, and Hans Bantilan. 2025. End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 968–977, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models (Nebhi et al., BEA 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.73.pdf

PDF Cite Search Fix data