LLM-Human Alignment in Evaluating Teacher Questioning Practices: Beyond Ratings to Explanation
Ruikun Hou, Tim Fütterer, Babette Bühler, Patrick Schreyer, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci
Abstract
This study investigates the alignment between large language models (LLMs) and human raters in assessing teacher questioning practices, moving beyond rating agreement to the evidence selected to justify their decisions. Findings highlight LLMs’ potential to support large-scale classroom observation through interpretable, evidence-based scoring, with possible implications for concrete teacher feedback.- Anthology ID:
- 2025.aimecon-main.26
- Volume:
- Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
- Month:
- October
- Year:
- 2025
- Address:
- Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
- Editors:
- Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
- Venue:
- AIME-Con
- SIG:
- Publisher:
- National Council on Measurement in Education (NCME)
- Note:
- Pages:
- 239–249
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.26/
- DOI:
- Cite (ACL):
- Ruikun Hou, Tim Fütterer, Babette Bühler, Patrick Schreyer, Peter Gerjets, Ulrich Trautwein, and Enkelejda Kasneci. 2025. LLM-Human Alignment in Evaluating Teacher Questioning Practices: Beyond Ratings to Explanation. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 239–249, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
- Cite (Informal):
- LLM-Human Alignment in Evaluating Teacher Questioning Practices: Beyond Ratings to Explanation (Hou et al., AIME-Con 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.26.pdf