Kevin Shi


2025

pdf bib
UPSC2M: Benchmarking Adaptive Learning from Two Million MCQ Attempts
Kevin Shi | Karttikeya Mangalam
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

We present UPSC2M, a large-scale dataset comprising two million multiple-choice question attempts from over 46,000 students, spanning nearly 9,000 questions across seven subject areas. The questions are drawn from the Union Public Service Commission (UPSC) examination, one of India’s most competitive and high-stakes assessments. Each attempt includes both response correctness and time taken, enabling fine-grained analysis of learner behavior and question characteristics. Over this dataset, we define two core benchmark tasks: question difficulty estimation and student performance prediction. The first task involves predicting empirical correctness rates using only question text. The second task focuses on predicting the likelihood of a correct response based on prior interactions. We evaluate simple baseline models on both tasks to demonstrate feasibility and establish reference points. Together, the dataset and benchmarks offer a strong foundation for building scalable, personalized educational systems. We release the dataset and code to support further research at the intersection of content understanding, learner modeling, and adaptive assessment.