Fabian Brunner


2025

pdf bib
On the Effectiveness of Prompt-Moderated LLMs for Math Tutoring at the Tertiary Level
Sebastian Steindl | Fabian Brunner | Nada Sissouno | Dominik Schwagerl | Florian Schöler-Niewiera | Ulrich Schäfer
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) have been studied intensively in the context of education, yielding heterogeneous results. Nowadays, these models are also deployed in formal education institutes. While specialized models exist, using prompt-moderated LLMs is widespread. In this study, we therefore investigate the effectiveness of prompt-moderated LLMs for math tutoring at a tertiary-level. We conduct a three-phase study with students (N=49) first receiving a review of the topics, then solving exercises, and finally writing an exam. During the exercises, they are presented with different types of assistance. We analyze the effect of LLM usage on the students’ performance, their engagement with the LLM, and their conversation strategies. Our results show that the prompt-moderation had a negative influence when compared to an unmoderated LLM. However, when the assistance was removed again, both LLM groups performed better than the control group, contradicting concerns about shallow learning. We publish the annotated conversations as a dataset to foster future research.