Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation

Onur Demirkaya; Hsin-Ro Wei; Evelyn Johnson

Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation

Onur Demirkaya, Hsin-Ro Wei, Evelyn Johnson

Abstract

This study explores the use of large language models to simulate human responses to Likert-scale items. A DeBERTa-base model fine-tuned with item text and examinee ability emulates a graded response model (GRM). High alignment with GRM probabilities and reasonable threshold recovery support LLMs as scalable tools for early-stage item evaluation.

Anthology ID:: 2025.aimecon-main.41
Volume:: Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:: October
Year:: 2025
Address:: Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:: Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:: AIME-Con
SIG:
Publisher:: National Council on Measurement in Education (NCME)
Note:
Pages:: 385–392
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.41/
DOI:
Bibkey:
Cite (ACL):: Onur Demirkaya, Hsin-Ro Wei, and Evelyn Johnson. 2025. Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 385–392, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):: Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation (Demirkaya et al., AIME-Con 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.41.pdf

PDF Cite Search Fix data