Can LLMs Reliably Simulate Real Students’ Abilities in Mathematics and Reading Comprehension?

KV Aditya Srivatsa, Kaushal Maurya, Ekaterina Kochmar


Abstract
Large Language Models (LLMs) are increasingly used as proxy students in the development of Intelligent Tutoring Systems (ITSs) and in piloting test questions. However, to what extent these proxy students accurately emulate the behavior and characteristics of real students remains an open question. To investigate this, we collected a dataset of 489 items from the National Assessment of Educational Progress (NAEP), covering mathematics and reading comprehension in grades 4, 8, and 12. We then apply an Item Response Theory (IRT) model to position 11 diverse and state-of-the-art LLMs on the same ability scale as real student populations. Our findings reveal that, without guidance, strong general-purpose models consistently outperform the average student at every grade, while weaker or domain-mismatched models may align incidentally. Using grade-enforcement prompts changes models’ performance, but whether they align with the average grade-level student remains highly model- and prompt-specific: no evaluated model–prompt pair fits the bill across subjects and grades, underscoring the need for new training and evaluation strategies. We conclude by providing guidelines for the selection of viable proxies based on our findings. All related code and data have been made available (https://github.com/kvadityasrivatsa/IRT-for-LLMs-as-Students).
Anthology ID:
2025.bea-1.75
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
988–1001
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.75/
DOI:
Bibkey:
Cite (ACL):
KV Aditya Srivatsa, Kaushal Maurya, and Ekaterina Kochmar. 2025. Can LLMs Reliably Simulate Real Students’ Abilities in Mathematics and Reading Comprehension?. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 988–1001, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Can LLMs Reliably Simulate Real Students’ Abilities in Mathematics and Reading Comprehension? (Srivatsa et al., BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.75.pdf