Can LLMs simulate the same correct solutions to free-response math problems as real students?

Yuya Asano, Diane Litman, Erin Walker


Abstract
Large language models (LLMs) have emerged as powerful tools for developing educational systems. While previous studies have explored modeling student mistakes, a critical gap remains in understanding whether LLMs can generate correct solutions that represent student responses to free-response problems. In this paper, we compare the distribution of solutions produced by four LLMs (one proprietary, two open-sourced general, and one open-sourced math models) with various sampling and prompting techniques and those generated by students, using conversations where students teach math problems to a conversational robot. Our study reveals discrepancies between the correct solutions produced by LLMs and by students. We discuss the practical implications of these findings for the design and evaluation of LLM-supported educational systems.
Anthology ID:
2025.emnlp-main.827
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16347–16376
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.827/
DOI:
Bibkey:
Cite (ACL):
Yuya Asano, Diane Litman, and Erin Walker. 2025. Can LLMs simulate the same correct solutions to free-response math problems as real students?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 16347–16376, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Can LLMs simulate the same correct solutions to free-response math problems as real students? (Asano et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.827.pdf
Checklist:
 2025.emnlp-main.827.checklist.pdf