Jia Zhu
2026
KCVR: Knowledge-Centric Video Reconstruction for Structured Pedagogical Summarization via Dynamic Graph Planning
Jingjiang Liu | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Xiaokang Jin | Yilin Wang | Qingyu Niu | Jiawei Shen | Guoqing Ma | Yidan Liang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingjiang Liu | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Xiaokang Jin | Yilin Wang | Qingyu Niu | Jiawei Shen | Guoqing Ma | Yidan Liang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing video summarization methods mainly compress content for gist browsing, but they often break the prerequisite logic in instructional videos and induce logical inversions (e.g., conclusions before premises). We formalize this problem as Structure-Pedagogical Reconstruction (SPR). SPR raises two challenges: (1) Structure Hallucination, where retrieved knowledge is topologically valid but not evidence-grounded by the blackboard; and (2) Logical Inversion, where soft prompt-level graph injection fails to enforce prerequisite order during decoding. To address these challenges, we propose Knowledge-Centric Video Reconstruction (KCVR), a Plan-then-Generate neuro-symbolic framework that decouples epistemic planning from content generation. KCVR prunes a Dual-Layer Epistemic Graph into a minimal video-supported plan, then realizes the plan with visually anchored attention and topology-constrained decoding. We additionally release EduStruct, a 10-discipline benchmark for SPR and structure-centric evaluation. Experiments show that KCVR outperforms strong end-to-end baselines on Knowledge Progression Consistency and Learning Objective Coverage. Our code and data are available at https://github.com/mark1001-ljj/video_sum.
RSDA: Restoring Stale Data Affinity via Dynamic Renovation Strategy for Mitigating Data Scarcity
Yidan Liang | Jia Zhu | Weijie Shi | Hanghui Guo | Yue Cui | Jiawei Shen | Guoqing Ma | Jingjiang Liu | Qingyu Niu | Yilin Wang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yidan Liang | Jia Zhu | Weijie Shi | Hanghui Guo | Yue Cui | Jiawei Shen | Guoqing Ma | Jingjiang Liu | Qingyu Niu | Yilin Wang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
High-quality data is the cornerstone of advancing large language models. However, the field currently faces a critical dilemma: the supply of premium data is nearing depletion, while vast stale corpora remain underutilized. Our empirical analysis reveals that training models on such data directly often leads to performance degradation. We attribute this phenomenon to the data affinity gap, a misalignment stemming from the model’s inability to effectively comprehend the data or inherent quality defects. To bridge this gap, we propose Restoring Stale Data Affinity (RSDA) framework. First, utilizing our proposed potential entropy metric, RSDA quantifies the latent value of samples to effectively identify stale data with higher renovation potential. Subsequently, the framework employs a dynamic renovation strategy selection mechanism to determine the optimal component-level strategy for each instance, transforming low-affinity stale samples into high-quality training data. Comprehensive experimental results demonstrate that RSDA effectively enhances data affinity, achieving performance improvements using less than 10% of the data volume, thereby underscoring that the latent potential of stale corpora remains largely untapped. The code is available at https://github.com/wenfiii/RSDA.
ReTRE: Benchmarking LLM Transfer Robustness with Structure-Preserving Variants
ZhongDong Li | Weijie Shi | Yue Cui | Haolun MA | Yuanjun Liu | Jiawei Li | An Liu | Jia Zhu | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
ZhongDong Li | Weijie Shi | Yue Cui | Haolun MA | Yuanjun Liu | Jiawei Li | An Liu | Jia Zhu | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have achieved strong performance on standard benchmarks, yet their performance is not robust across different task manifestations. It remains unclear how performance changes under controlled task rewrites that preserve the original solution structure, while varying the rewrite type and level. To address this question, we introduce ReTRE (Rewrite-based Transfer Robustness Evaluation), an evaluation benchmark inspired by learning transfer theory that probes transfer robustness along two rewrite levels: Near Transfer and Far Transfer. ReTRE employs a multi-agent system to construct textual and visual variants while preserving the structure of the original solution. Evaluations on mathematical and science tasks across state-of-the-art multimodal LLMs reveal a consistent transfer gap: performance exhibits a general declining trend as transfer similarity drops and strong text performance can face performance decline under cross-modal transfer. Crucially, we identify a divergence between post-training paradigms: reinforcement learning preserves transfer robustness, whereas supervised fine-tuning tends to overfit the training distribution, leading to severe degradation in far-transfer performance despite strong in-distribution accuracy.
2025
Making RALM Robust to Irrelevant Contexts via Layer Knowledge Guided Attention
Weijie Shi | Hao Chen | Jiaming Li | Yao Zhao | Yazhong Zhang | Qijin Chen | Jipeng Zhang | Ruiyuan Zhang | Jia Zhu | Jiajie Xu | Xiaofang Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Weijie Shi | Hao Chen | Jiaming Li | Yao Zhao | Yazhong Zhang | Qijin Chen | Jipeng Zhang | Ruiyuan Zhang | Jia Zhu | Jiajie Xu | Xiaofang Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Retrieval-augmented language models (RALMs) aim to incorporate external knowledge to address the issues of factual hallucination and knowledge obsolescence faced by large language models (LLMs). Inevitably, the retrieved passages based on similarity search may be irrelevant to the given question, and the aggregation of these passages can confuse the model to give a correct answer. To improve the performance of RALM in such conditions, we propose layer-knowledge guided attention for RALMs, which harnesses the layer-wise knowledge of LLMs to optimize per-layer attention on useful passages, making the model pay attention to the most relevant content and ignore irrelevant ones. Specifically, we first systematically study LLM’s attention patterns and their relationship with the accuracy of RALM responses, where middle-focus attentions play a crucial role in selectively gathering relevant information. Based on this, a layer-wise passage estimator leverages the varied knowledge encoded across LLM layers to assess not only passage relevance scores but also associated confidences. Finally, a relevance-aware passage fusion enables selective attention to relevant passages, mitigating distractibility and positional bias of causal attention. Experiments show that our method outperforms existing methods on RALM benchmarks.
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training
Weijie Shi | Jipeng Zhang | Yaguang Wu | Jingzhi Fang | Shibo Zhang | Yao Zhao | Hao Chen | Ruiyuan Zhang | Yue Cui | Jia Zhu | Sirui Han | Jiajie Xu | Xiaofang Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Weijie Shi | Jipeng Zhang | Yaguang Wu | Jingzhi Fang | Shibo Zhang | Yao Zhao | Hao Chen | Ruiyuan Zhang | Yue Cui | Jia Zhu | Sirui Han | Jiajie Xu | Xiaofang Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) are commonly trained on multi-domain datasets, where domain sampling strategies significantly impact model performance due to varying domain importance across downstream tasks. Existing approaches for optimizing domain-level sampling strategies struggle with maintaining intra-domain consistency and accurately measuring domain impact. In this paper, we present Domain Impact-aware Data Sampling (DIDS). To ensure intra-domain consistency, a gradient clustering algorithm is proposed to group training data based on their learning effects, where a proxy language model and dimensionality reduction are employed to reduce computational overhead. To accurately measure domain impact, we develop a Fisher Information Matrix (FIM) guided metric that quantifies how domain-specific parameter updates affect the model’s output distributions on downstream tasks, with theoretical guarantees. Furthermore, to determine optimal sampling ratios, DIDS combines both the FIM-guided domain impact assessment and loss learning trajectories that indicate domain-specific potential, while accounting for diminishing marginal returns. Extensive experiments demonstrate that DIDS achieves 3.4% higher average performance while maintaining comparable training efficiency. The code is available at https://github.com/shiweijiezero/DIDS.
LegalReasoner: Step-wised Verification-Correction for Legal Judgment Reasoning
Weijie Shi | Han Zhu | Jiaming Ji | Mengze Li | Jipeng Zhang | Ruiyuan Zhang | Jia Zhu | Jiajie Xu | Sirui Han | Yike Guo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weijie Shi | Han Zhu | Jiaming Ji | Mengze Li | Jipeng Zhang | Ruiyuan Zhang | Jia Zhu | Jiajie Xu | Sirui Han | Yike Guo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Legal judgment prediction (LJP) aims to function as a judge by making final rulings based on case claims and facts, which plays a vital role in the judicial domain for supporting court decision-making and improving judicial efficiency. However, existing methods often struggle with logical errors when conducting complex legal reasoning. We propose LegalReasoner, which enhances LJP reliability through step-wise verification and correction of the reasoning process. Specifically, it first identifies dispute points to decompose complex cases, and then conducts step-wise reasoning while employing a process verifier to validate each step’s logic from correctness, progressiveness, and potential perspectives. When errors are detected, expert-designed attribution and resolution strategies are applied for correction. To fine-tune LegalReasoner, we release the LegalHK dataset, containing 58,130 Hong Kong court cases with detailed annotations of dispute points, step-by-step reasoning chains, and process verification labels. Experiments demonstrate that LegalReasoner significantly improves concordance with court decisions from 72.37 to 80.27 on LLAMA-3.1-70B. The data is available at https://huggingface.co/datasets/weijiezz/LegalHK.
DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
Hanghui Guo | Jia Zhu | Shimin Di | Weijie Shi | Zhangze Chen | Jiajie Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hanghui Guo | Jia Zhu | Shimin Di | Weijie Shi | Zhangze Chen | Jiajie Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dynamic Retrieval-augmented Generation (RAG) has shown great success in mitigating hallucinations in large language models (LLMs) during generation. However, existing dynamic RAG methods face significant limitations in two key aspects: 1) Lack of an effective mechanism to control retrieval triggers, and 2) Lack of effective scrutiny of retrieval content. To address these limitations, we propose an innovative dynamic RAG method, DioR (Adaptive Cognitive Detection and Contextual Retrieval Optimization), which consists of two main components: adaptive cognitive detection and contextual retrieval optimization, specifically designed to determine when retrieval is needed and what to retrieve for LLMs is useful. Experimental results demonstrate that DioR achieves superior performance on all tasks, demonstrating the effectiveness of our work.
2020
CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning
Hongru Wang | Xiangru Tang | Sunny Lai | Kwong Sak Leung | Jia Zhu | Gabriel Pui Cheong Fung | Kam-Fai Wong
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Hongru Wang | Xiangru Tang | Sunny Lai | Kwong Sak Leung | Jia Zhu | Gabriel Pui Cheong Fung | Kam-Fai Wong
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes our system submitted to task 4 of SemEval 2020: Commonsense Validation and Explanation (ComVE) which consists of three sub-tasks. The task is to directly validate the given sentence whether or not to make sense and require the model to explain it. Based on BERT architecture with the multi-task setting, we propose an effective and interpretable “Explain, Reason and Predict” (ERP) system to solve the three sub-tasks about commonsense: (a) Validation, (b) Reasoning, and (c) Explanation. Inspired by cognitive studies of common sense, our system first generates a reason or understanding of the sentences and then choose which one statement makes sense, which is achieved by multi-task learning. During the post-evaluation, our system has reached 92.9% accuracy in subtask A (rank 11), 89.7% accuracy in subtask B (rank 9), and BLEU score of 12.9 in subtask C (rank 8).
2018
Aspect and Sentiment Aware Abstractive Review Summarization
Min Yang | Qiang Qu | Ying Shen | Qiao Liu | Wei Zhao | Jia Zhu
Proceedings of the 27th International Conference on Computational Linguistics
Min Yang | Qiang Qu | Ying Shen | Qiao Liu | Wei Zhao | Jia Zhu
Proceedings of the 27th International Conference on Computational Linguistics
Review text has been widely studied in traditional tasks such as sentiment analysis and aspect extraction. However, to date, no work is towards the abstractive review summarization that is essential for business organizations and individual consumers to make informed decisions. This work takes the lead to study the aspect/sentiment-aware abstractive review summarization by exploring multi-factor attentions. Specifically, we propose an interactive attention mechanism to interactively learns the representations of context words, sentiment words and aspect words within the reviews, acted as an encoder. The learned sentiment and aspect representations are incorporated into the decoder to generate aspect/sentiment-aware review summaries via an attention fusion network. In addition, the abstractive summarizer is jointly trained with the text categorization task, which helps learn a category-specific text encoder, locating salient aspect information and exploring the variations of style and wording of content with respect to different text categories. The experimental results on a real-life dataset demonstrate that our model achieves impressive results compared to other strong competitors.
2017
NLPTEA 2017 Shared Task – Chinese Spelling Check
Gabriel Fung | Maxime Debosschere | Dingmin Wang | Bo Li | Jia Zhu | Kam-Fai Wong
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Gabriel Fung | Maxime Debosschere | Dingmin Wang | Bo Li | Jia Zhu | Kam-Fai Wong
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
This paper provides an overview along with our findings of the Chinese Spelling Check shared task at NLPTEA 2017. The goal of this task is to develop a computer-assisted system to automatically diagnose typing errors in traditional Chinese sentences written by students. We defined six types of errors which belong to two categories. Given a sentence, the system should detect where the errors are, and for each detected error determine its type and provide correction suggestions. We designed, constructed, and released a benchmark dataset for this task.
2016
ACE: Automatic Colloquialism, Typographical and Orthographic Errors Detection for Chinese Language
Shichao Dong | Gabriel Pui Cheong Fung | Binyang Li | Baolin Peng | Ming Liao | Jia Zhu | Kam-fai Wong
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
Shichao Dong | Gabriel Pui Cheong Fung | Binyang Li | Baolin Peng | Ming Liao | Jia Zhu | Kam-fai Wong
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
We present a system called ACE for Automatic Colloquialism and Errors detection for written Chinese. ACE is based on the combination of N-gram model and rule-base model. Although it focuses on detecting colloquial Cantonese (a dialect of Chinese) at the current stage, it can be extended to detect other dialects. We chose Cantonese becauase it has many interesting properties, such as unique grammar system and huge colloquial terms, that turn the detection task extremely challenging. We conducted experiments using real data and synthetic data. The results indicated that ACE is highly reliable and effective.
Search
Fix author
Co-authors
- Weijie Shi 7
- Jiajie Xu 7
- Yue Cui 4
- Shimin Di 3
- Hanghui Guo 3
- Kam-Fai Wong 3
- Jipeng Zhang 3
- Ruiyuan Zhang 3
- Hao Chen 2
- Gabriel Pui Cheong Fung 2
- Sirui Han 2
- Yidan Liang 2
- Jingjiang Liu 2
- Guoqing Ma 2
- Qingyu Niu 2
- Jiawei Shen 2
- Yilin Wang 2
- Yao Zhao 2
- Xiaofang Zhou 2
- Qijin Chen 1
- Zhangze Chen 1
- Maxime Debosschere 1
- Shichao Dong 1
- Jingzhi Fang 1
- Gabriel Fung 1
- Yike Guo 1
- Jiaming Ji 1
- Xiaokang Jin 1
- Sunny Lai 1
- Kwong Sak Leung 1
- Binyang Li 1
- Bo Li 1
- Jiaming Li 1
- Jiawei Li 1
- Mengze Li 1
- ZhongDong Li 1
- Ming Liao 1
- An Liu 1
- Qiao Liu 1
- Yuanjun Liu 1
- Haolun MA 1
- Baolin Peng 1
- Qiang Qu 1
- Ying Shen 1
- Xiangru Tang 1
- Dingmin Wang 1
- Hongru Wang 1
- Yaguang Wu 1
- Min Yang 1
- Shibo Zhang 1
- Yazhong Zhang 1
- Wei Zhao 1
- Han Zhu 1