Bo Zhang
Other people with similar names: Bo Zhang, Bo Zhang
Unverified author pages with similar names: Bo Zhang
2026
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement
Shengji Tang | Jianjian Cao | Weihao Lin | Jiale Hong | Bo Zhang | Shuyue Hu | Lei Bai | Tao Chen | Wanli Ouyang | Peng Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shengji Tang | Jianjian Cao | Weihao Lin | Jiale Hong | Bo Zhang | Shuyue Hu | Lei Bai | Tao Chen | Wanli Ouyang | Peng Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing multi-LLM collaboration systems often encounter scalability challenges when integrating new LLMs and tasks, leading to suboptimal performance. To address this, we propose SMCS, a Scalable Multi-LLM Collaboration System designed to effectively coordinate multiple open-source LLMs. The system consists of two core components: a Retrieval-based Prior Selection (RPS) module, which dynamically selects the most suitable LLMs for each input, and an Exploration–Exploitation-Driven Posterior Enhancement (EPE) module, which fosters response diversity and selects high-quality outputs through a hybrid scoring mechanism. Experiments on eight mainstream benchmarks validate the effectiveness of our system: by integrating fifteen open-source LLMs, SMCS outperforms prevailing closed-source LLMs, e.g., GPT-4.1(**+5.36%**) and GPT-o3-mini(**+5.28%**) across multiple tasks. Remarkably, it even exceeds the average of best results on different datasets with open-source LLMs (**+2.86%**), significantly advancing the empirical performance frontier of open-source collaboration. The code is released at https://github.com/magent4aci/SMCS.
FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph Flow
Bihao Zhan | Zongsheng Cao | Jie Zhou | Bo Zhang | Liang He
Findings of the Association for Computational Linguistics: ACL 2026
Bihao Zhan | Zongsheng Cao | Jie Zhou | Bo Zhang | Liang He
Findings of the Association for Computational Linguistics: ACL 2026
Graph-based retrieval-augmented generation (GraphRAG) is effective for knowledge-intensive and multi-hop query tasks; however, many existing methods primarily seed entity-based graphs and rely on implicit semantic relevance propagation. This often (i) under-retrieves when user queries are abstract and semantically sparse at the entity level, and (ii) suffers from brittle multi-hop reasoning, where noisy activations can derail entity-to-entity transitions and corrupt the inferred relation chain, yielding unreliable conclusions. To this end, we propose FlowRAG, a semantic-aware retrieval framework that improves both semantic recall and explicit reasoning. Specifically, FlowRAG constructs a quad-level heterogeneous graph over passages, summaries, sentences, and entities, where summary nodes serve as a coarse semantic hub. At retrieval time, a dual-granularity activation module combines summary–query alignment with sentence-level matching to activate relevant entities under paraphrase and abstraction robustly. We then introduce a frequency-aware weighted flow module that routes relevance through entity–passage links weighted by within-passage term frequency, pruning noisy connections and extracting high-confidence reasoning paths as an explicit logic skeleton for generation. Extensive experiments show that obtains state-of-the-art performance on complex reasoning benchmarks.
MTRouter: Cost-Aware Multi-Turn LLM Routing with History–Model Joint Embeddings
Yiqun Zhang | Hao Li | Zihan Wang | Shi Feng | Xiaocui Yang | Daling Wang | Bo Zhang | Lei Bai | Shuyue Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yiqun Zhang | Hao Li | Zihan Wang | Shi Feng | Xiaocui Yang | Daling Wang | Bo Zhang | Lei Bai | Shuyue Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history–model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance–cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity’s Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several mechanisms underlying its effectiveness: relative to prior multi-turn routers, MTRouter makes fewer model switches, is more tolerant to transient errors, and exhibits emergent specialization across models.Code: https://github.com/ZhangYiqun018/MTRouter
FlowSearch: Advancing Deep Research with Dynamic Structured Knowledge Flow
Yusong Hu | Runmin Ma | Yue Fan | Jinxin Shi | Zongsheng Cao | Yuhao Zhou | Jiakang Yuan | Shuaiyu Zhang | Shiyang Feng | Xiangchao Yan | Shufei Zhang | Wenlong Zhang | Lei Bai | Bo Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yusong Hu | Runmin Ma | Yue Fan | Jinxin Shi | Zongsheng Cao | Yuhao Zhou | Jiakang Yuan | Shuaiyu Zhang | Shiyang Feng | Xiangchao Yan | Shufei Zhang | Wenlong Zhang | Lei Bai | Bo Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for agentic systems. To address this, we propose FlowSearch, a multi-agent framework that actively constructs and evolves a dynamic structured knowledge flow to drive subtask execution and reasoning. FlowSearch is capable of strategically planning and expanding the knowledge flow to enable parallel exploration and hierarchical task decomposition, while also adjusting the knowledge flow in real time based on feedback from intermediate reasoning outcomes and insights. FlowSearch achieves competitive performance on both general and scientific benchmarks, including GAIA, HLE, GPQA and TRQA, demonstrating its effectiveness in multi-disciplinary research scenarios and its potential to advance scientific discovery. The code will be available.
Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning
Zhiyin Yu | Bo Zhang | Qibin Hou | Zhonghai Wu | Xiao Luo | Lei Bai
Findings of the Association for Computational Linguistics: ACL 2026
Zhiyin Yu | Bo Zhang | Qibin Hou | Zhonghai Wu | Xiao Luo | Lei Bai
Findings of the Association for Computational Linguistics: ACL 2026
Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial annotation cost and issues such as model collapse or reward hacking. To address these issues, we introduce a new perspective inspired by cognitive learning theory and propose a novel approach called EasyRL. The core of EasyRL is to simulate the human cognitive acquisition curve by integrating reliable knowledge transfer from easy labeled data with a progressive divide-and-conquer strategy that tackles increasingly difficult unlabeled data. Specifically, we initialize a warm-up model using supervised RL with few-shot labeled data. This is followed by a divide-and-conquer pseudo-labeling strategy on difficult unlabeled data, combining consistency-based selection for low-uncertainty cases and reflection-based resolution for medium-uncertainty cases. Finally, difficulty-progressive self-training with iterative pseudo-labeling and RL further strengthens the model’s reasoning capability. EasyRL provides a unified self-evolving framework that facilitates data-efficient post-training of LLMs. Experimental results on mathematical and scientific benchmarks demonstrate that EasyRL, using only 10% of easy labeled data, consistently outperforms state-of-the-art baselines.
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.
2025
SURVEYFORGE : On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
Xiangchao Yan | Shiyang Feng | Jiakang Yuan | Renqiu Xia | Bin Wang | Lei Bai | Bo Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiangchao Yan | Shiyang Feng | Jiakang Yuan | Renqiu Xia | Bin Wang | Lei Bai | Bo Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SURVEYFORGE, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SURVEYFORGE can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SURVEYFORGEcan outperform previous works such as AutoSurvey.
Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
Jiakang Yuan | Xiangchao Yan | Bo Zhang | Tao Chen | Botian Shi | Wanli Ouyang | Yu Qiao | Lei Bai | Bowen Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiakang Yuan | Xiangchao Yan | Bo Zhang | Tao Chen | Botian Shi | Wanli Ouyang | Yu Qiao | Lei Bai | Bowen Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI). Recent works demonstrate that various AI-assisted research methods can largely improve research efficiency by improving data analysis, accelerating computation, and fostering novel idea generation. To further move towards the ultimate goal (i.e., automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. Dolphin first generates novel ideas based on feedback from previous experiments and relevant papers ranked by the topic and task attributes. Then, the generated ideas can be implemented using a code template refined and debugged with the designed exception-traceback-guided local code structure. Finally, Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation. Experiments are conducted on the benchmark datasets of different topics and a subset of MLE-bench. Results show that Dolphin can continuously improve the performance of the input topic in a loop. We highlight that Dolphin can automatically propose methods that are comparable to the state-of-the-art in some tasks such as 3D point classification.
Search
Fix author
Co-authors
- Lei Bai 7
- Xiangchao Yan 3
- Jiakang Yuan 3
- Zongsheng Cao 2
- Tao Chen 2
- Shiyang Feng 2
- Shuyue Hu 2
- Xiao Luo 2
- Wanli Ouyang 2
- Zhonghai Wu 2
- Zhiyin Yu 2
- Yatao Bian 1
- Jianjian Cao 1
- Chunchun Chen 1
- Yue Fan 1
- Shi Feng 1
- Liang He 1
- Tieke He 1
- Jiale Hong 1
- Qibin Hou 1
- Yusong Hu 1
- Hao Li 1
- Weihao Lin 1
- Yunhui Liu 1
- Junyu Luo 1
- Runmin Ma 1
- Yuchen Mou 1
- Yu Qiao 1
- Botian Shi 1
- Jinxin Shi 1
- Hongru Sun 1
- Shengji Tang 1
- Bin Wang 1
- Zihan Wang 1
- Daling Wang 1
- Xing Wei 1
- Renqiu Xia 1
- Jun Xu 1
- Juncheng Yan 1
- Xiaocui Yang 1
- Jie Yang 1
- Peng Ye 1
- Wei Ye 1
- Bihao Zhan 1
- Yiqun Zhang 1
- Shuaiyu Zhang 1
- Shufei Zhang 1
- Wenlong Zhang 1
- Yuxing Zhang 1
- Ming Zhang 1
- Guanjie Zheng 1
- Jie Zhou 1
- Bowen Zhou 1
- Yuhao Zhou 1