Yue Shen
2026
Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training
Lei Liu | Hao Zhu | Xiaoyan Yang | Yue Shen | Zhixuan Chu | Jian Wang | Jinjie Gu | Kui Ren
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lei Liu | Hao Zhu | Xiaoyan Yang | Yue Shen | Zhixuan Chu | Jian Wang | Jinjie Gu | Kui Ren
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Continual Pre-training (CPT) serves as a fundamental approach for adapting foundation models to domain-specific applications. Scaling laws for pre-training define a power-law relationship between dataset size and the test loss of an LLM. However, the marginal gains from simply increasing data for CPT diminish rapidly, yielding suboptimal data utilization and inefficient training. To address this challenge, we propose a novel perplexity-aware data scaling law to establish a predictive relationship between the perplexity landscape of domain-specific data and the test loss. Our approach leverages the pre-trained model’s own perplexity on domain data as a proxy for estimating the knowledge gap, effectively quantifying the informational perplexity landscape of candidate training samples. By fitting this scaling law across diverse perplexity regimes, we enable adaptive selection of high-utility data subsets, prioritizing content that maximizes knowledge absorption while minimizing redundancy and noise. Extensive experiments on both medical and general-domain benchmarks demonstrate that our method consistently identifies near-optimal training subsets, achieving superior performance with significantly reduced data consumption.
WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning
Junjie Wang | Zequn Xie | Dan Yang | Jie Feng | Yue Shen | Duolin Sun | Meixiu Long | Yihan Jiao | Zhehao Tan | Jian Wang | Peng Wei | Jinjie Gu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Junjie Wang | Zequn Xie | Dan Yang | Jie Feng | Yue Shen | Duolin Sun | Meixiu Long | Yihan Jiao | Zhehao Tan | Jian Wang | Peng Wei | Jinjie Gu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresses web agent trajectories via graph-based pruning. Concretely, we model the agent’s search process as a state graph and cast trajectory optimization as a minimum-necessary Directed Acyclic Graph (DAG) mining problem, yielding pruned trajectories that preserve essential reasoning while eliminating redundant steps. Continued training on these refined trajectories enables the agent to evolve toward more efficient search patterns and reduces tool-call rounds by about 20% while improving accuracy. Furthermore, we introduce a new metric called F-AE Score to measure the model’s overall performance in balancing accuracy and efficiency. Experiments demonstrate that WebClipper compresses tool-call rounds under excellent performance, providing practical insight into balancing effectiveness and efficiency in web agent design.
2025
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Yuqi Zhu | Shuofei Qiao | Yixin Ou | Shumin Deng | Shiwei Lyu | Yue Shen | Lei Liang | Jinjie Gu | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: NAACL 2025
Yuqi Zhu | Shuofei Qiao | Yixin Ou | Shumin Deng | Shiwei Lyu | Yue Shen | Lei Liang | Jinjie Gu | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: NAACL 2025
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories during task solving and results in planning hallucination. To address this issue, we introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge. Specifically, KnowAgent employs an action knowledge base and a knowledgeable self-learning strategy to constrain the action path during planning, enabling more reasonable trajectory synthesis, and thereby enhancing the planning performance of language agents. Experimental results on HotpotQA and ALFWorld based on various backbone models demonstrate that KnowAgent can achieve comparable or superior performance to existing baselines. Further analysis indicates the effectiveness of KnowAgent in terms of planning hallucinations mitigation.
HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation
Yihan Jiao | Zhehao Tan | Dan Yang | Duolin Sun | Jie Feng | Yue Shen | Jian Wang | Peng Wei
Findings of the Association for Computational Linguistics: EMNLP 2025
Yihan Jiao | Zhehao Tan | Dan Yang | Duolin Sun | Jie Feng | Yue Shen | Jian Wang | Peng Wei
Findings of the Association for Computational Linguistics: EMNLP 2025
Retrieval-augmented generation (RAG) has become a fundamental paradigm for addressing the challenges faced by large language models in handling real-time information and domain-specific problems. Traditional RAG systems primarily rely on the in-context learning (ICL) capabilities of the large language model itself. Still, in-depth research on the specific capabilities needed by the RAG generation model is lacking, leading to challenges with inconsistent document quality and retrieval system imperfections. Even the limited studies that fine-tune RAG generative models often lack a granular focus on RAG tasks or a deeper utilization of chain-of-thought processes. To address this, we propose that RAG models should possess three progressively hierarchical abilities (1) Filtering: the ability to select relevant information; (2) Combination: the ability to combine semantic information across paragraphs; and (3) RAG-specific reasoning: the ability to further process external knowledge using internal knowledge. Thus, we introduce our new RAG instruction fine-tuning method, Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation (HIRAG) incorporates a “think before answering” strategy. This method enhances the model’s open-book examination capability by utilizing multi-level progressive chain-of-thought. Experiments show that the HIRAG training strategy significantly improves the model’s performance on datasets such as RGB, PopQA, MuSiQue, HotpotQA, and PubmedQA.
2024
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs
Junjie Wang | Mingyang Chen | Binbin Hu | Dan Yang | Ziqi Liu | Yue Shen | Peng Wei | Zhiqiang Zhang | Jinjie Gu | Jun Zhou | Jeff Z. Pan | Wen Zhang | Huajun Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Junjie Wang | Mingyang Chen | Binbin Hu | Dan Yang | Ziqi Liu | Yue Shen | Peng Wei | Zhiqiang Zhang | Jinjie Gu | Jun Zhou | Jeff Z. Pan | Wen Zhang | Huajun Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs’ performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fine-tuning. Previous work has relied on manual annotation and knowledge distillation from teacher LLMs, which are time-consuming and not accurate enough. In this paper, we introduce a novel framework for enhancing LLMs’ planning capabilities by using planning data derived from knowledge graphs (KGs). LLMs fine-tuned with this data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval. Evaluations on multiple datasets, including our newly proposed benchmark, highlight the effectiveness of our framework and the benefits of KG-derived planning data.
Editing Conceptual Knowledge for Large Language Models
Xiaohan Wang | Shengyu Mao | Shumin Deng | Yunzhi Yao | Yue Shen | Lei Liang | Jinjie Gu | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Xiaohan Wang | Shengyu Mao | Shumin Deng | Yunzhi Yao | Yue Shen | Lei Liang | Jinjie Gu | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this work can inspire further progress in understanding LLMs.
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen | Chenxi Wang | Yida Xue | Ningyu Zhang | Xiaoyan Yang | Qiang Li | Yue Shen | Lei Liang | Jinjie Gu | Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiang Chen | Chenxi Wang | Yida Xue | Ningyu Zhang | Xiaoyan Yang | Qiang Li | Yue Shen | Lei Liang | Jinjie Gu | Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
Search
Fix author
Co-authors
- Jinjie Gu 6
- Huajun Chen 4
- Lei Liang 3
- Jian Wang 3
- Peng Wei 3
- Dan Yang 3
- Ningyu Zhang 3
- Shumin Deng 2
- Yihan Jiao 2
- Duolin Sun 2
- Zhehao Tan 2
- Mingyang Chen 1
- Xiang Chen 1
- Zhixuan Chu 1
- Jie Feng 1
- Jie Feng 1
- Binbin Hu 1
- Qiang Li 1
- Lei Liu 1
- Ziqi Liu 1
- Meixiu Long 1
- Shiwei Lyu 1
- Shengyu Mao 1
- Yixin Ou 1
- Jeff Z. Pan 1
- Shuofei Qiao 1
- Kui Ren 1
- Chenxi Wang 1
- Junjie Wang 1
- Junjie Wang 1
- Xiaohan Wang 1
- Zequn Xie 1
- Yida Xue 1
- Xiaoyan Yang 1
- Xiaoyan Yang 1
- Yunzhi Yao 1
- Wen Zhang 1
- Zhiqiang Zhang 1
- Jun Zhou 1
- Hao Zhu 1
- Yuqi Zhu 1