Jingjiang Liu
2026
KCVR: Knowledge-Centric Video Reconstruction for Structured Pedagogical Summarization via Dynamic Graph Planning
Jingjiang Liu | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Xiaokang Jin | Yilin Wang | Qingyu Niu | Jiawei Shen | Guoqing Ma | Yidan Liang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingjiang Liu | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Xiaokang Jin | Yilin Wang | Qingyu Niu | Jiawei Shen | Guoqing Ma | Yidan Liang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing video summarization methods mainly compress content for gist browsing, but they often break the prerequisite logic in instructional videos and induce logical inversions (e.g., conclusions before premises). We formalize this problem as Structure-Pedagogical Reconstruction (SPR). SPR raises two challenges: (1) Structure Hallucination, where retrieved knowledge is topologically valid but not evidence-grounded by the blackboard; and (2) Logical Inversion, where soft prompt-level graph injection fails to enforce prerequisite order during decoding. To address these challenges, we propose Knowledge-Centric Video Reconstruction (KCVR), a Plan-then-Generate neuro-symbolic framework that decouples epistemic planning from content generation. KCVR prunes a Dual-Layer Epistemic Graph into a minimal video-supported plan, then realizes the plan with visually anchored attention and topology-constrained decoding. We additionally release EduStruct, a 10-discipline benchmark for SPR and structure-centric evaluation. Experiments show that KCVR outperforms strong end-to-end baselines on Knowledge Progression Consistency and Learning Objective Coverage. Our code and data are available at https://github.com/mark1001-ljj/video_sum.
RSDA: Restoring Stale Data Affinity via Dynamic Renovation Strategy for Mitigating Data Scarcity
Yidan Liang | Jia Zhu | Weijie Shi | Hanghui Guo | Yue Cui | Jiawei Shen | Guoqing Ma | Jingjiang Liu | Qingyu Niu | Yilin Wang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yidan Liang | Jia Zhu | Weijie Shi | Hanghui Guo | Yue Cui | Jiawei Shen | Guoqing Ma | Jingjiang Liu | Qingyu Niu | Yilin Wang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
High-quality data is the cornerstone of advancing large language models. However, the field currently faces a critical dilemma: the supply of premium data is nearing depletion, while vast stale corpora remain underutilized. Our empirical analysis reveals that training models on such data directly often leads to performance degradation. We attribute this phenomenon to the data affinity gap, a misalignment stemming from the model’s inability to effectively comprehend the data or inherent quality defects. To bridge this gap, we propose Restoring Stale Data Affinity (RSDA) framework. First, utilizing our proposed potential entropy metric, RSDA quantifies the latent value of samples to effectively identify stale data with higher renovation potential. Subsequently, the framework employs a dynamic renovation strategy selection mechanism to determine the optimal component-level strategy for each instance, transforming low-affinity stale samples into high-quality training data. Comprehensive experimental results demonstrate that RSDA effectively enhances data affinity, achieving performance improvements using less than 10% of the data volume, thereby underscoring that the latent potential of stale corpora remains largely untapped. The code is available at https://github.com/wenfiii/RSDA.