Andreas Nehring

2026

Talk moves are discourse categories used to analyse classroom interactions. They provide insights into the types of exchanges between teachers and students and can serve as indicators of teaching quality, supporting feedback and reflection. The automatic classification of talk moves is therefore valuable for educational research and teacher development. While previous studies have explored this task, almost all have focused on English data. We constructed a small corpus of German science classroom transcripts and investigated whether multilingual language models can classify talk moves effectively under data-scarce conditions. Specifically, we examined (1) training with a very limited amount of German data and (2) cross-lingual transfer from English training data, which also entails cross-cultural adaptation. Our results show that multilingual large language models are capable of cross-lingual and cross-cultural transfer, but models trained directly on even a small amount of German data achieve better performance. Combining English and German data yields the best results overall, though the additional benefit of including English data is small.

2024

pdf bib abs

Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation
Maja Stahl | Leon Biermann | Andreas Nehring | Henning Wachsmuth
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.

Co-authors

David Schmitt 1

Christian Schumburg 1

Maja Stahl 1

Henning Wachsmuth 1

Christian Wartena 1

Venues

BEA1
LREC1

Fix author