Shufei Zhang
2026
Step-GRPO: Internalizing Dynamic Early Exit for Efficient Reasoning
Benteng Chen | Weida Wang | Shufei Zhang | Mingbao Lin | Min Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Benteng Chen | Weida Wang | Shufei Zhang | Mingbao Lin | Min Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large reasoning models that use long chain-of-thought excel at problem-solving yet waste compute on redundant checks. Curbing this overthinking is hard: training-time length penalties can cripple ability, while inference-time early-exit adds system overhead. To bridge this gap, we propose **Step-GRPO**, a novel post-training framework that internalizes dynamic early-exit capabilities directly into the model. Step-GRPO shifts the optimization objective from raw tokens to semantic steps by utilizing linguistic markers to structure reasoning. We introduce a Dynamic Truncated Rollout mechanism that exposes the model to concise high-confidence trajectories during exploration, synergized with a Step-Aware Relative Reward that dynamically penalizes redundancy based on group-level baselines. Extensive experiments across three model sizes on diverse benchmarks demonstrate that Step-GRPO achieves a superior accuracy-efficiency trade-off. On Qwen3-8B, our method reduces token consumption by 32.0% compared to the vanilla model while avoiding the accuracy degradation observed in traditional length-penalty methods.
FlowSearch: Advancing Deep Research with Dynamic Structured Knowledge Flow
Yusong Hu | Runmin Ma | Yue Fan | Jinxin Shi | Zongsheng Cao | Yuhao Zhou | Jiakang Yuan | Shuaiyu Zhang | Shiyang Feng | Xiangchao Yan | Shufei Zhang | Wenlong Zhang | Lei Bai | Bo Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yusong Hu | Runmin Ma | Yue Fan | Jinxin Shi | Zongsheng Cao | Yuhao Zhou | Jiakang Yuan | Shuaiyu Zhang | Shiyang Feng | Xiangchao Yan | Shufei Zhang | Wenlong Zhang | Lei Bai | Bo Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for agentic systems. To address this, we propose FlowSearch, a multi-agent framework that actively constructs and evolves a dynamic structured knowledge flow to drive subtask execution and reasoning. FlowSearch is capable of strategically planning and expanding the knowledge flow to enable parallel exploration and hierarchical task decomposition, while also adjusting the knowledge flow in real time based on feedback from intermediate reasoning outcomes and insights. FlowSearch achieves competitive performance on both general and scientific benchmarks, including GAIA, HLE, GPQA and TRQA, demonstrating its effectiveness in multi-disciplinary research scenarios and its potential to advance scientific discovery. The code will be available.
Nature-Inspired Population-Based Evolution of Large Language Models
Yiqun Zhang | Peng Ye | Xiaocui Yang | Shi Feng | Shufei Zhang | Lei Bai | Wanli Ouyang | Shuyue Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yiqun Zhang | Peng Ye | Xiaocui Yang | Shi Feng | Shufei Zhang | Lei Bai | Wanli Ouyang | Shuyue Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evolution, the engine behind the survival and growth of life on Earth, operates through the population-based process of reproduction. Inspired by this principle, this paper formally defines a newly emerging problem: the population-based evolution of large language models (LLMs). We introduce a novel framework that starts with a population of parent LLMs and allows this population to evolve through four key operations: (i) crossover, merging the weights of different parents to create offspring LLMs, (ii) mutation, introducing small, random changes to model weights to foster diversity, (iii) selection, prioritizing high-performing models, and (iv) succession, transferring the learned experience from parent to offspring LLMs. With only 200 samples per new task, the LLM population evolves rapidly to adapt to the task at hand, without any gradients. Experiments on 12 datasets show that our framework consistently outperforms existing multi-LLM merging and adaptation methods, achieving relative performance gains of up to 54.8 over the best LLM in the initial population. Moreover, our framework allows for (i) the evolution of LLMs across multiple new tasks simultaneously, (ii) scaling effectively with populations of up to 40 LLMs, and (iii) even zero-shot generalization to unseen held-out tasks. Code: https://github.com/ZhangYiqun018/GENOME
2025
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
Di Zhang | Jianbo Wu | Jingdi Lei | Tong Che | Jiatong Li | Tong Xie | Xiaoshui Huang | Shufei Zhang | Marco Pavone | Yuqiang Li | Wanli Ouyang | Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Di Zhang | Jianbo Wu | Jingdi Lei | Tong Che | Jiatong Li | Tong Xie | Xiaoshui Huang | Shufei Zhang | Marco Pavone | Yuqiang Li | Wanli Ouyang | Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
This paper presents LLaMA-Berry, an advanced mathematical reasoning framework to enhance the problem-solving ability of large language models (LLMs). The framework combines Monte Carlo Tree Search with Self-Refine (SR-MCTS) to optimize the reasoning paths and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critique and rewriting capabilities of LLMs, our SR-MCTS overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms, enabling a more efficient exploration of solution spaces. To guide the search process, we propose the Pairwise Preference Reward Model (PPRM), which predicts pairwise preferences between solutions through instruction-following capabilities trained by Reinforcement Learning from Human Feedback (RLHF). Finally, the Enhanced Borda Count (EBC) method is adopted to synthesize pairwise preferences into global quantile scores for evaluations. This approach mitigates the challenges of scoring variability and non-independent distributions in mathematical reasoning tasks. The framework has been tested on general and advanced benchmarks, showing superior search efficiency and performance compared to existing open-source and closed-source methods, particularly in complex Olympiad-level benchmarks, including AIME24 and AMC23.
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
Haonan He | Yuchen Ren | Yining Tang | Ziyang Xu | Junxian Li | Minghao Yang | Di Zhang | Yuan Dong | Tao Chen | Shufei Zhang | Yuqiang Li | Nanqing Dong | Wanli Ouyang | Dongzhan Zhou | Peng Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Haonan He | Yuchen Ren | Yining Tang | Ziyang Xu | Junxian Li | Minghao Yang | Di Zhang | Yuan Dong | Tao Chen | Shufei Zhang | Yuqiang Li | Nanqing Dong | Wanli Ouyang | Dongzhan Zhou | Peng Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) have shown remarkable capabilities in general domains, but their application to multi-omics biology remains underexplored. To address this gap, we introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences, including DNA, RNA, proteins, and multi-molecules. This dataset bridges LLMs and complex biological sequence-related tasks, enhancing their versatility and reasoning while maintaining conversational fluency. We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training. To overcome this, we propose ChatMultiOmics, a strong baseline with a novel three-stage training pipeline, demonstrating superior biological understanding through Biology-Instructions. Both resources are publicly available, paving the way for better integration of LLMs in multi-omics analysis. The Biology-Instructions is publicly available at: https://github.com/hhnqqq/Biology-Instructions.
ReKG-MCTS: Reinforcing LLM Reasoning on Knowledge Graphs via Training-Free Monte Carlo Tree Search
Xiaozhuang Song | Shufei Zhang | Tianshu Yu
Findings of the Association for Computational Linguistics: ACL 2025
Xiaozhuang Song | Shufei Zhang | Tianshu Yu
Findings of the Association for Computational Linguistics: ACL 2025
Recent advancements in combining knowledge graphs (KGs) with large language models (LLMs) have demonstrated promising potential in complex KG reasoning tasks, yet existing approaches face limitations in path exploration strategies or excessive computational overhead. We propose ReKG-MCTS, a novel training-free framework that synergizes Monte Carlo Tree Search (MCTS) with LLM capabilities to enable dynamic reasoning over KGs. The framework conceptualizes KG reasoning as a decision-making process, where MCTS strategically explores paths over KG while LLMs provide semantic guidance for reasoning paths. The framework consists of four phases: (1) UCB-based node selection that balances exploration-exploitation on KG, (2) path expansion with KG structural constraints, (3) LLM-guided MC rollouts for simulation, and (4) value backpropagation. Experimental results on WebQSP and CWQ demonstrate that ReKG-MCTS outperforms existing training-free methods and achieves competitive performance compared to fine-tuned baselines. These findings suggest a new paradigm for leveraging language models in KG reasoning tasks. The code is available at https://github.com/ShawnKS/rekgmcts.
Search
Fix author
Co-authors
- Wanli Ouyang 3
- Lei Bai 2
- Yuqiang Li 2
- Peng Ye 2
- Di Zhang 2
- Dongzhan Zhou 2
- Zongsheng Cao 1
- Tong Che 1
- Benteng Chen 1
- Tao Chen 1
- Nanqing Dong 1
- Yuan Dong 1
- Yue Fan 1
- Shi Feng 1
- Shiyang Feng 1
- Haonan He 1
- Shuyue Hu 1
- Yusong Hu 1
- Xiaoshui Huang 1
- Jingdi Lei 1
- Jiatong Li 1
- Junxian Li 1
- Mingbao Lin 1
- Runmin Ma 1
- Marco Pavone 1
- Yuchen Ren 1
- Jinxin Shi 1
- Xiaozhuang Song 1
- Yining Tang 1
- Weida Wang 1
- Jianbo Wu 1
- Tong Xie 1
- Ziyang Xu 1
- Xiangchao Yan 1
- Minghao Yang 1
- Xiaocui Yang 1
- Tianshu Yu 1
- Jiakang Yuan 1
- Bo Zhang 1
- Min Zhang 1
- Shuaiyu Zhang 1
- Wenlong Zhang 1
- Yiqun Zhang 1
- Yuhao Zhou 1