Assessing the Quality and Consistency of Automated Knowledge Component Generation using Instructor-generated Questions and LLMs

Jordan Esiason; Priyanka Khare; Wookhee Min; Seung Lee; Gamze Ozogul; Xiaoying Zheng; Yeil Jeong

Assessing the Quality and Consistency of Automated Knowledge Component Generation using Instructor-generated Questions and LLMs

Jordan Esiason, Priyanka Khare, Wookhee Min, Seung Lee, Gamze Ozogul, Xiaoying Zheng, Yeil Jeong

Abstract

Lecture-style instruction is one of the most prevalent forms of learning in postsecondary education in the United States. Despite the factors that make lectures a convenient format, they tend to present few opportunities for meaningful engagement between students and the course materials being presented due to factors such as the overhead associated with interacting with large numbers of students. By utilizing large language models, we have created a pipeline built upon the ExplainIt classroom response system for processing student self-explanations produced during lectures using automatically generated knowledge components. This pipeline can facilitate deeper engagement with course materials, offer traceability in assessment results, and allows instructors to respond to student errors or misconceptions in real-time during lecture. While previous work using a proprietary large language model has examined the basic functionality of this pipeline, this work more closely examines the consistency and quality of this pipeline using both a large closed-weight model and a smaller open-weight model, with or without retrieval-augmented generation (RAG). The use of open-source models could allow institutions deploying ExplainIt to maintain control of their student data without substantially sacrificing performance. We find that while there are small statistically significant differences in performance between the RAG conditions of each LLM, they are nearly comparable at this task. Additionally, the LLM-generated knowledge components are of higher quality when relevant course material is provided for RAG, although consistency is not improved. These results indicate that both large closed-weight and smaller open-weight models show promise in this task, but fine-tuning may be necessary to improve performance further.

Anthology ID:: 2026.bea-1.18
Volume:: Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 248–258
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.18/
DOI:
Bibkey:
Cite (ACL):: Jordan Esiason, Priyanka Khare, Wookhee Min, Seung Lee, Gamze Ozogul, Xiaoying Zheng, and Yeil Jeong. 2026. Assessing the Quality and Consistency of Automated Knowledge Component Generation using Instructor-generated Questions and LLMs. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 248–258, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Assessing the Quality and Consistency of Automated Knowledge Component Generation using Instructor-generated Questions and LLMs (Esiason et al., BEA 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.18.pdf

PDF Cite Search Fix data