Jingying Zeng
2026
A Reward-Guided Dual-Phase Framework for Adaptive Inference-Time Reasoning
Yingqian Cui | Zhenwei Dai | Pengfei He | Bing He | Hui Liu | Zhan Shi | Xianfeng Tang | Jingying Zeng | Suhang Wang | Yue Xing | Jiliang Tang | Benoit Dumoulin
Findings of the Association for Computational Linguistics: ACL 2026
Yingqian Cui | Zhenwei Dai | Pengfei He | Bing He | Hui Liu | Zhan Shi | Xianfeng Tang | Jingying Zeng | Suhang Wang | Yue Xing | Jiliang Tang | Benoit Dumoulin
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have made strong progress in reasoning. To enhance the reasoning performance, a common inference-time approach is tree-based search, which decomposes the reasoning process into multiple steps, expands multiple reasoning paths, and uses reward models to prune and select candidates. However, based on our exploration, the simple decomposition may lead to suboptimal searching efficiency: while planning is generally harder, it is the execution errors that are more likely to propagate to later steps. This indicates that planning and execution play different roles in reasoning and should be treated differently during tree-based search. Given this, to enhance the searching efficiency, we propose a dual-phase test-time scaling framework that separates reasoning into planning and execution, and performs search over each phase independently. To further refine the algorithm, we also introduce a dynamic budget allocation mechanism that adaptively redistributes sampling effort based on reward feedback, allowing early stopping on confident steps and reallocation of computation to more challenging steps. Experiments on both math reasoning and code generation benchmarks demonstrate that our approach consistently improves accuracy while reducing redundant computation.
2025
Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
Yingqian Cui | Pengfei He | Jingying Zeng | Hui Liu | Xianfeng Tang | Zhenwei Dai | Yan Han | Chen Luo | Jing Huang | Zhen Li | Suhang Wang | Yue Xing | Jiliang Tang | Qi He
Findings of the Association for Computational Linguistics: ACL 2025
Yingqian Cui | Pengfei He | Jingying Zeng | Hui Liu | Xianfeng Tang | Zhenwei Dai | Yan Han | Chen Luo | Jing Huang | Zhen Li | Suhang Wang | Yue Xing | Jiliang Tang | Qi He
Findings of the Association for Computational Linguistics: ACL 2025
Chain-of-Thought (CoT) reasoning, which breaks down complex tasks into intermediate reasoning steps, has significantly enhanced the performance of large language models (LLMs) on challenging tasks. However, the detailed reasoning process in CoT often incurs long generation times and high computational costs, partly due to the inclusion of unnecessary steps. To address this, we propose a method to identify critical reasoning steps using perplexity as a measure of their importance: a step is deemed critical if its removal causes a significant increase in perplexity. Our method enables models to focus solely on generating these critical steps. This can be achieved through two approaches: refining demonstration examples in few-shot CoT or fine-tuning the model using selected examples that include only critical steps. Comprehensive experiments validate the effectiveness of our method, which achieves a better balance between the reasoning accuracy and efficiency of CoT.
A General Framework to Enhance Fine-tuning-based LLM Unlearning
Jie Ren | Zhenwei Dai | Xianfeng Tang | Hui Liu | Jingying Zeng | Zhen Li | Rahul Goutam | Suhang Wang | Yue Xing | Qi He | Hui Liu
Findings of the Association for Computational Linguistics: ACL 2025
Jie Ren | Zhenwei Dai | Xianfeng Tang | Hui Liu | Jingying Zeng | Zhen Li | Rahul Goutam | Suhang Wang | Yue Xing | Qi He | Hui Liu
Findings of the Association for Computational Linguistics: ACL 2025
Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations—essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.