Amrita Panesar


2025

pdf bib
End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models
Kamel Nebhi | Amrita Panesar | Hans Bantilan
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Automated item generation (AIG) is a key enabler for scaling language proficiency assessments. We present an end-to-end methodology for automated generation, annotation, and integration of adaptive writing items for the EF Standard English Test (EFSET), leveraging recent advances in large language models (LLMs). Our pipeline uses few-shot prompting with state-of-the-art LLMs to generate diverse, proficiency-aligned prompts, rigorously validated by expert reviewers. For robust scoring, we construct a synthetic response dataset via majority-vote LLM annotation and fine-tune a LLaMA 3.1 (8B) model. For each writing item, a range of proficiency-aligned synthetic responses, designed to emulate authentic student work, are produced for model training and evaluation. These results demonstrate substantial gains in scalability and validity, offering a replicable framework for next-generation adaptive language testing.