Using LLMs for item creation: Validating the potential of automatically generated sentence repetition test items for language assessment

Sarah Löber; Björn Rudzewitz; Yuan Chu; Mengyuan He; Shiqin Liu; Yushan Ye; Xiaobin Chen

Using LLMs for item creation: Validating the potential of automatically generated sentence repetition test items for language assessment

Sarah Löber, Björn Rudzewitz, Yuan Chu, Mengyuan He, Shiqin Liu, Yushan Ye, Xiaobin Chen

Abstract

Various aspects of the Elicited Imitation Test (EIT), a sentence repetition task for language assessment, can be automated, for example in terms of test administration or automatic scoring. It is potentially also possible to generate test items with Large Language Models (LLMs). This study investigates the potential of GPT-4o for item creation in the context of EIT, creating a parallel form to two popular and validated tests. We analysed the tests in terms of their linguistic and psychometric properties. While the items created by the LLM show some difference in grammatical structures when compared to human-written items, linguistic complexity results did not differ significantly between tests. Psychometric properties showed only minor differences. These findings lend support to the potential of Automatic Item Generation with LLMs in the context of sentence repetition tasks and might support the process of standardisation in SLA research and testing by enabling parallel test creation.

Anthology ID:: 2026.bea-1.30
Volume:: Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 443–454
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.30/
DOI:
Bibkey:
Cite (ACL):: Sarah Löber, Björn Rudzewitz, Yuan Chu, Mengyuan He, Shiqin Liu, Yushan Ye, and Xiaobin Chen. 2026. Using LLMs for item creation: Validating the potential of automatically generated sentence repetition test items for language assessment. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 443–454, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Using LLMs for item creation: Validating the potential of automatically generated sentence repetition test items for language assessment (Löber et al., BEA 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.30.pdf

PDF Cite Search Fix data