EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos

Sourjyadip Ray, Shubham Sharma, Somak Aditya, Pawan Goyal


Abstract
As digital platforms redefine educational paradigms, ensuring interactivity remains vital for effective learning. This paper explores using Multimodal Large Language Models (MLLMs) to automatically respond to student questions from online lectures - a novel question answering task of real world significance. We introduce the EduVidQA Dataset with 5252 question-answer pairs (both synthetic and real-world) from 296 computer science videos covering diverse topics and difficulty levels. To understand the needs of the dataset and task evaluation, we empirically study the qualitative preferences of students, which we provide as an important contribution to this line of work. Our benchmarking experiments consist of 6 state-of-the-art MLLMs, through which we study the effectiveness of our synthetic data for finetuning, as well as showing the challenging nature of the task. We evaluate the models using both text-based and qualitative metrics, thus showing a nuanced perspective of the models’ performance, which is paramount to future work. This work not only sets a benchmark for this important problem, but also opens exciting avenues for future research in the field of Natural Language Processing for Education.
Anthology ID:
2025.emnlp-main.1760
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34689–34715
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1760/
DOI:
Bibkey:
Cite (ACL):
Sourjyadip Ray, Shubham Sharma, Somak Aditya, and Pawan Goyal. 2025. EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34689–34715, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos (Ray et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1760.pdf
Checklist:
 2025.emnlp-main.1760.checklist.pdf