2025
pdf
bib
abs
Large Language Models for Education: Understanding the Needs of Stakeholders, Current Capabilities and the Path Forward
Sankalan Pal Chowdhury
|
Nico Daheim
|
Ekaterina Kochmar
|
Jakub Macina
|
Donya Rooein
|
Mrinmaya Sachan
|
Shashank Sonkar
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
This tutorial will aim to bridge the gap between NLP researchers and Artificial Intelligence in Education (AIED) practitioners to help participants understand the requirements and challenges of education, enabling them to develop LLMs that align with educational needs, and to enable educators to gain a deeper understanding of the capabilities and limitations of current NLP technologies, fostering effective integration of LLMs in educational contexts.
pdf
bib
abs
Educators’ Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
Sankalan Pal Chowdhury
|
Terry Jingchen Zhang
|
Donya Rooein
|
Dirk Hovy
|
Tanja Käser
|
Mrinmaya Sachan
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
The rapid development of Large Language Models (LLMs) opens up the possibility of using them aspersonal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various degrees of engineering. In this study, we seek to compare human tutors with LLM tutorsin terms of engagement, empathy, scaffolding, and conciseness. We ask human tutors to compare the performance of an LLM tutor with that of a human tutor in teaching grade-school math word problems on these qualities. We find that annotators with teaching experience perceive LLMs as showing higher performance than human tutors in all 4 metrics. The biggest advantage is in empathy, where 80% of our annotators prefer the LLM tutor more often than the human tutors. Our study paints a positive picture of LLMs as tutors and indicates that these models can be used to reduce the load on human teachers in the future.
pdf
bib
abs
Are Large Language Models for Education Reliable Across Languages?
Vansh Gupta
|
Sankalan Pal Chowdhury
|
Vilém Zouhar
|
Donya Rooein
|
Mrinmaya Sachan
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in non-English languages is warranted. We evaluated the performance of popular LLMs on four educational tasks: identifying student misconceptions, providing targeted feedback, interactive tutoring, and grading translations in eight languages (Mandarin, Hindi, Arabic, German, Farsi, Telugu, Ukrainian, Czech) in addition to English. We find that the performance on these tasks somewhat corresponds to the amount of language represented in training data, with lower-resource languages having poorer task performance. However, at least some models are able to more or less maintain their levels of performance across all languages. Thus, we recommend that practitioners first verify that the LLM works well in the target language for their educational task before deployment.
2024
pdf
bib
abs
Book2Dial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots
Junling Wang
|
Jakub Macina
|
Nico Daheim
|
Sankalan Pal Chowdhury
|
Mrinmaya Sachan
Findings of the Association for Computational Linguistics: ACL 2024
Educational chatbots are a promising tool for assisting student learning. However, the development of effective chatbots in education has been challenging, as high-quality data is seldom available in this domain. In this paper, we propose a framework for generating synthetic teacher-student interactions grounded in a set of textbooks. Our approaches capture a key aspect of learning interactions where curious students with partial knowledge interactively ask teachers questions about the material in the textbook. We highlight various quality criteria that such dialogues must fulfill and compare several approaches relying on either prompting or finetuning large language models according to these criteria. We use the synthetic dialogues to train educational chatbots and show the benefits of further fine-tuning in educational domains. However, careful human evaluation shows that our best data synthesis method still suffers from hallucinations and tends to reiterate information from previous conversations. Our findings offer insights for future efforts in synthesizing conversational data that strikes a balance between size and quality. We will open-source our data and code.