2025
pdf
bib
abs
Don’t Half-listen: Capturing Key-part Information in Continual Instruction Tuning
Yongquan He
|
Wenyuan Zhang
|
Xuancheng Huang
|
Peng Zhang
|
Lingxun Meng
|
Xiang Zhou
|
Ke Zeng
|
Xunliang Cai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying data, which may only remember the surface-level pattern of instructions and get confused on held-out tasks. In this paper, we propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG). Our method computes the information gain on masked parts to dynamically replay data and refine the training objective, which enables LLMs to capture task-aware information relevant to the correct response and alleviate overfitting to general descriptions in instructions. In addition, we propose two metrics, P-score and V-score, to measure the generalization and instruction-following abilities of LLMs. Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.
pdf
bib
abs
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
Mengqi Liao
|
Xiangyu Xi
|
Chen Ruinian
|
Jia Leng
|
Yangen Hu
|
Ke Zeng
|
Shuai Liu
|
Huaiyu Wan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Reasoning large language models (LLMs) excel in complex tasks, which has drawn significant attention to reinforcement learning (RL) for LLMs. However, existing approaches allocate an equal number of rollouts to all questions during the RL process, which is inefficient. This inefficiency stems from the fact that training on simple questions yields limited gains, whereas more rollouts are needed for challenging questions to sample correct answers. Furthermore, while RL improves response precision, it limits the model’s exploration ability, potentially resulting in a performance cap below that of the base model prior to RL. To address these issues, we propose a mechanism for dynamically allocating rollout budgets based on the difficulty of the problems, enabling more efficient RL training. Additionally, we introduce an adaptive dynamic temperature adjustment strategy to maintain the entropy at a stable level, thereby encouraging sufficient exploration. This enables LLMs to improve response precision while preserving their exploratory ability to uncover potential correct pathways. The code and data is available on: https://anonymous.4open.science/r/E3-RL4LLMs-DB28
pdf
bib
abs
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
Xiaoyun Zhang
|
Jingqing Ruan
|
Xing Ma
|
Yawen Zhu
|
Jiansong Chen
|
Ke Zeng
|
Xunliang Cai
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Detecting abnormal events in real-world customer service dialogues is highly challenging due to the complexity of business data and the dynamic nature of customer interactions. Moreover, models must demonstrate strong out-of-domain (OOD) generalization to enable rapid adaptation across different business scenarios and maximize commercial value.In this work, we propose a novel Adaptive Perplexity-Aware Reinforcement Learning (APARL) framework that leverages the advanced reasoning capabilities of large language models for abnormal event detection. APARL introduces a dual-loop dynamic curriculum learning architecture, enabling the model to progressively focus on more challenging samples as its proficiency increases. This design effectively addresses performance bottlenecks and significantly enhances OOD transferability.Extensive evaluations on food delivery dialogue tasks show that our model achieves significantly enhanced adaptability and robustness, attaining the highest F1 score with an average improvement of 17.19%, and an average improvement of 9.59% in OOD transfer tests. This method provides a superior solution for industrial deployment of anomaly detection models, contributing to improved operational efficiency and commercial benefits.
2024
pdf
bib
abs
Learning or Self-aligning? Rethinking Instruction Fine-tuning
Mengjie Ren
|
Boxi Cao
|
Hongyu Lin
|
Cao Liu
|
Xianpei Han
|
Ke Zeng
|
Wan Guanglu
|
Xunliang Cai
|
Le Sun
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Instruction Fine-tuning (IFT) is a crucial phase in building large language models (LLMs). Previous works mainly focus on the IFT’s role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. Surprisingly, our experiments reveal that attempting to learn additional world knowledge through IFT often struggles to yield positive impacts and can even lead to markedly negative effects. Further, we discover that maintaining internal knowledge consistency before and after IFT is a critical factor for achieving successful IFT. Our findings reveal the underlying mechanisms of IFT and provide robust support for some very recent and potential future works.
pdf
bib
abs
Pattern Shifting or Knowledge Losing? A Forgetting Perspective for Understanding the Effect of Instruction Fine-Tuning
Chunkang Zhang
|
Boxi Cao
|
Yaojie Lu
|
Hongyu Lin
|
Liu Cao
|
Ke Zeng
|
Guanglu Wan
|
Xunliang Cai
|
Xianpei Han
|
Le Sun
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Instruction Fine-Tuning(IFT) emerges as an essential step of training large language models torobustly carry out tasks of interest. However, there lacks a systematic investigation about theunderlying mechanisms of instruction fine-tuning, particularly on the forgetting phenomenonafter IFT, known as alignment tax. Therefore, to understand the mechanism of IFT from theforgetting perspective, we investigate the alternation of the text pattern and knowledge withinmodels throughout the entire IFT process. Specifically, we restore fine-tuned models to their baseversion by training them on the data sharing a similar distribution with the pre-training corpusand compare their results Our experiment indicates that there is a stage transition of forgettingduring IFT process: (1) Pseudo Forgetting: in this stage, models mainly shift their familiar textpattern away from pre-training data format while the world knowledge is preserved. Consequently,models will recover to their original performance when they are restored to the base version. (2)Actual Forgetting: in this stage, models forget the acquired knowledge as well. Therefore, theyfail to reach the original performance even if they are restored to the base version.”
pdf
bib
abs
Dual-Stage Multi-Task Syntax-Oriented Pre-Training for Syntactically Controlled Paraphrase Generation
Hongxu Liu
|
Xiaojie Wang
|
Jiashen Sun
|
Ke Zeng
|
Wan Guanglu
Findings of the Association for Computational Linguistics: ACL 2024
Syntactically Controlled Paraphrase Generation (SCPG), which aims at generating sentences having syntactic structures resembling given exemplars, is attracting more research efforts in recent years. We took an empirical survey on previous SCPG datasets and methods and found three tacitly approved while seldom mentioned intrinsic shortcomings/trade-offs in terms of data obtaining, task formulation, and pre-training strategies. As a mitigation to these shortcomings, we proposed a novel Dual-Stage Multi-Task (DSMT) pre-training scheme, involving a series of structure-oriented and syntax-oriented tasks, which, in our opinion, gives sequential text models the ability of com-prehending intrinsically non-sequential structures like Linearized Constituency Trees (LCTs), understanding the underlying syntactics, and even generating them by parsing sentences. We performed further pre-training of the popular T5 model on these novel tasks and fine-tuned the trained model on every possible variant of SCPG task in literature, finding that our models significantly outperformed (up to 10+ BLEU-4) previous state-of-the-art methods. Finally, we carried out ablation studies which demonstrated the effectiveness of our DSMT methods and emphasized on the SCPG performance gains compared to vanilla T5 models, especially on hard samples or under few-shot settings.