Zhiyi Duan


2026

Knowledge Tracing (KT) infers a student’s knowledge state from past interactions to predict future performance. Conventional Deep Learning (DL)-based KT models are typically tied to platform-specific identifiers and latent representations, making them hard to transfer and interpret. Large Language Model (LLM)-based methods can be either ungrounded under prompting or overly domain-dependent under fine-tuning. In addition, most existing KT methods are developed and evaluated under a same-distribution assumption. In real deployments, educational data often arise from heterogeneous platforms with substantial distribution shift, which often degrades generalization. To this end, we propose RAG-KT, a retrieval-augmented paradigm that frames cross-platform KT as reliable context constrained inference with LLMs. It builds a unified multi-source structured context with cross-source alignment via Question Group abstractions and retrieves complementary rich and reliable context for each prediction, enabling grounded prediction and interpretable diagnosis. Experiments on three public KT benchmarks demonstrate consistent gains in accuracy and robustness, including strong performance under cross-platform conditions.
Knowledge Tracing (KT) is essential for tracking students’ evolving knowledge states and predicting their future performance. While current graph-based methods focus on exercise-concept relations, they often overlook the inherent group structures among students. Similarly, emerging LLM-based approaches rely on individual histories, lacking the broader context of group references and contrastive evidence. As a result, existing individual-isolation paradigms fail to provide stable predictions and evidence-based explanations. To bridge this gap, we propose Micro-Community Knowledge Tracing (MicroC-KT), a framework that incorporates learning micro-environments to provide social-cognitive anchors for KT. MicroC-KT identifies latent learning communities via hypergraph modeling and generates dual-granular summaries to facilitate community matching and peer retrieval. By extracting contrastive group evidence, the model prompts an LLM to generate both accurate answer predictions and verifiable analysis reports. Experiments on four public datasets demonstrate that MicroC-KT significantly outperforms state-of-the-art baselines in predictive performance while providing more reliable and evidence-based explanations.
Current benchmarks for Large Reasoning Models (LRMs) primarily rely on answer correctness, failing to assess the structural coherence and cognitive soundness of the reasoning process itself. To address this gap, we introduce Cognitive Hierarchy Trace (CHT), a novel evaluation framework grounded in Bloom’s Cognitive Taxonomy (BCT). CHT provides a structured, step-wise mapping of a model’s reasoning trajectory onto hierarchical cognitive levels, enabling the detection of structural anomalies such as hierarchy jumps, breaks, and overthinking. Based on CHT, we present BloomEval, the first large-scale benchmark designed for fine-grained cognitive capability assessment. It comprises 94,602 math problems, each annotated with Bloom’s cognitive levels, CHT trajectories, a three-tier knowledge hierarchy, and problem difficulty. To ensure scalable yet reliable annotation, we develop an Expert-LLM collaborative pipeline with a three-stage reconciliation mechanism. Our comprehensive evaluation reveals a critical finding: models often arrive at correct answers through cognitively flawed or opaque reasoning paths. The CHT-based analysis uncovers prevalent structural inconsistencies that are invisible to outcome-only metrics, demonstrating that answer accuracy is an insufficient proxy for reasoning quality.
Teacher sentiment analysis is pivotal for understanding instructional dynamics, yet it remains challenging because classroom expressions are professionally regulated performances rather than spontaneous outbursts. However, existing approaches typically treat sentiment as a static, monolithic label, failing to capture this structured heterogeneity. To effectively model this complexity, we decompose teacher sentiment into three granularities: coarse-level performativity, medium-level intra-class heterogeneity, and fine-level cross-modal complementarity. Guided by this perspective, we propose CF-TSA, a coarse-to-fine multimodal framework. Specifically, we employ CLS-guided cross-modal attention to recover effective signals from regulated displays (coarse-level), thresholded substyle discovery to identify latent pedagogical styles (medium-level), and substyle-aware contrastive learning to align dynamic multimodal cue compositions (fine-level). Experiments on T-MED and CMU-MOSEI demonstrate that CF-TSA consistently outperforms state-of-the-art baselines, validating the effectiveness of the coarse-to-fine perspective and the hierarchical modeling.