A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data
Ahatsham Hayat, Helen Martinez, Bilal Khan, Mohammad Rashedul Hasan
Abstract
Forecasting nuanced shifts in student engagement from longitudinal experiential (LE) data—multi-modal, qualitative trajectories of academic experiences over time—remains challenging due to high dimensionality and missingness. We propose a natural language processing (NLP)-driven framework using large language models (LLMs) to forecast binary engagement levels across four dimensions: Lecture Engagement Disposition, Academic Self-Efficacy, Performance Self-Evaluation, and Academic Identity and Value Perception. Evaluated on 960 trajectories from 96 first-year STEM students, our three-tier approach—LLM-informed imputation to generate textual descriptors for missing-not-at-random (MNAR) patterns, zero-shot feature selection via ensemble voting, and fine-tuned LLMs—processes textual non-cognitive responses. LLMs substantially outperform numeric baselines (e.g., Random Forest, LSTM) by capturing contextual nuances in student responses. Encoder-only LLMs surpass decoder-only variants, highlighting architectural strengths for sparse, qualitative LE data. Our framework advances NLP solutions for modeling student engagement from complex LE data, excelling where traditional methods struggle.- Anthology ID:
- 2025.conll-1.22
- Volume:
- Proceedings of the 29th Conference on Computational Natural Language Learning
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Gemma Boleda, Michael Roth
- Venues:
- CoNLL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 334–347
- Language:
- URL:
- https://preview.aclanthology.org/display_plenaries/2025.conll-1.22/
- DOI:
- Cite (ACL):
- Ahatsham Hayat, Helen Martinez, Bilal Khan, and Mohammad Rashedul Hasan. 2025. A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data. In Proceedings of the 29th Conference on Computational Natural Language Learning, pages 334–347, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data (Hayat et al., CoNLL 2025)
- PDF:
- https://preview.aclanthology.org/display_plenaries/2025.conll-1.22.pdf