Peng Ye
2026
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement
Shengji Tang | Jianjian Cao | Weihao Lin | Jiale Hong | Bo Zhang | Shuyue Hu | Lei Bai | Tao Chen | Wanli Ouyang | Peng Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shengji Tang | Jianjian Cao | Weihao Lin | Jiale Hong | Bo Zhang | Shuyue Hu | Lei Bai | Tao Chen | Wanli Ouyang | Peng Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing multi-LLM collaboration systems often encounter scalability challenges when integrating new LLMs and tasks, leading to suboptimal performance. To address this, we propose SMCS, a Scalable Multi-LLM Collaboration System designed to effectively coordinate multiple open-source LLMs. The system consists of two core components: a Retrieval-based Prior Selection (RPS) module, which dynamically selects the most suitable LLMs for each input, and an Exploration–Exploitation-Driven Posterior Enhancement (EPE) module, which fosters response diversity and selects high-quality outputs through a hybrid scoring mechanism. Experiments on eight mainstream benchmarks validate the effectiveness of our system: by integrating fifteen open-source LLMs, SMCS outperforms prevailing closed-source LLMs, e.g., GPT-4.1(**+5.36%**) and GPT-o3-mini(**+5.28%**) across multiple tasks. Remarkably, it even exceeds the average of best results on different datasets with open-source LLMs (**+2.86%**), significantly advancing the empirical performance frontier of open-source collaboration. The code is released at https://github.com/magent4aci/SMCS.
Nature-Inspired Population-Based Evolution of Large Language Models
Yiqun Zhang | Peng Ye | Xiaocui Yang | Shi Feng | Shufei Zhang | Lei Bai | Wanli Ouyang | Shuyue Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yiqun Zhang | Peng Ye | Xiaocui Yang | Shi Feng | Shufei Zhang | Lei Bai | Wanli Ouyang | Shuyue Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evolution, the engine behind the survival and growth of life on Earth, operates through the population-based process of reproduction. Inspired by this principle, this paper formally defines a newly emerging problem: the population-based evolution of large language models (LLMs). We introduce a novel framework that starts with a population of parent LLMs and allows this population to evolve through four key operations: (i) crossover, merging the weights of different parents to create offspring LLMs, (ii) mutation, introducing small, random changes to model weights to foster diversity, (iii) selection, prioritizing high-performing models, and (iv) succession, transferring the learned experience from parent to offspring LLMs. With only 200 samples per new task, the LLM population evolves rapidly to adapt to the task at hand, without any gradients. Experiments on 12 datasets show that our framework consistently outperforms existing multi-LLM merging and adaptation methods, achieving relative performance gains of up to 54.8 over the best LLM in the initial population. Moreover, our framework allows for (i) the evolution of LLMs across multiple new tasks simultaneously, (ii) scaling effectively with populations of up to 40 LLMs, and (iii) even zero-shot generalization to unseen held-out tasks. Code: https://github.com/ZhangYiqun018/GENOME
2025
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
Haonan He | Yuchen Ren | Yining Tang | Ziyang Xu | Junxian Li | Minghao Yang | Di Zhang | Yuan Dong | Tao Chen | Shufei Zhang | Yuqiang Li | Nanqing Dong | Wanli Ouyang | Dongzhan Zhou | Peng Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Haonan He | Yuchen Ren | Yining Tang | Ziyang Xu | Junxian Li | Minghao Yang | Di Zhang | Yuan Dong | Tao Chen | Shufei Zhang | Yuqiang Li | Nanqing Dong | Wanli Ouyang | Dongzhan Zhou | Peng Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) have shown remarkable capabilities in general domains, but their application to multi-omics biology remains underexplored. To address this gap, we introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences, including DNA, RNA, proteins, and multi-molecules. This dataset bridges LLMs and complex biological sequence-related tasks, enhancing their versatility and reasoning while maintaining conversational fluency. We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training. To overcome this, we propose ChatMultiOmics, a strong baseline with a novel three-stage training pipeline, demonstrating superior biological understanding through Biology-Instructions. Both resources are publicly available, paving the way for better integration of LLMs in multi-omics analysis. The Biology-Instructions is publicly available at: https://github.com/hhnqqq/Biology-Instructions.
2024
Prompt-fused Framework for Inductive Logical Query Answering
Zezhong Xu | Wen Zhang | Peng Ye | Lei Liang | Huajun Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Zezhong Xu | Wen Zhang | Peng Ye | Lei Liang | Huajun Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Answering logical queries on knowledge graphs (KG) poses a significant challenge for machine reasoning. The primary obstacle in this task stems from the inherent incompleteness of KGs. Existing research has predominantly focused on addressing the issue of missing edges in KGs, thereby neglecting another aspect of incompleteness: the emergence of new entities. Furthermore, most of the existing methods tend to reason over each logical operator separately, rather than comprehensively analyzing the query as a whole during the reasoning process. In this paper, we propose a query-aware prompt-fused framework named Pro-QE, which could incorporate existing query embedding methods and address the embedding of emerging entities through contextual information aggregation. Additionally, a query prompt, which is generated by encoding the symbolic query, is introduced to gather information relevant to the query from a holistic perspective. To evaluate the efficacy of our model in the inductive setting, we introduce two new challenging benchmarks. Experimental results demonstrate that our model successfully handles the issue of unseen entities in logical queries. Furthermore, the ablation study confirms the efficacy of the aggregator and prompt components.
2022
Ruleformer: Context-aware Rule Mining over Knowledge Graph
Zezhong Xu | Peng Ye | Hui Chen | Meng Zhao | Huajun Chen | Wen Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Zezhong Xu | Peng Ye | Hui Chen | Meng Zhao | Huajun Chen | Wen Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Rule mining is an effective approach for reasoning over knowledge graph (KG). Existing works mainly concentrate on mining rules. However, there might be several rules that could be applied for reasoning for one relation, and how to select appropriate rules for completion of different triples has not been discussed. In this paper, we propose to take the context information into consideration, which helps select suitable rules for the inference tasks. Based on this idea, we propose a transformer-based rule mining approach, Ruleformer. It consists of two blocks: 1) an encoder extracting the context information from subgraph of head entities with modified attention mechanism, and 2) a decoder which aggregates the subgraph information from the encoder output and generates the probability of relations for each step of reasoning. The basic idea behind Ruleformer is regarding rule mining process as a sequence to sequence task. To make the subgraph a sequence input to the encoder and retain the graph structure, we devise a relational attention mechanism in Transformer. The experiment results show the necessity of considering these information in rule mining task and the effectiveness of our model.
2012
Search
Fix author
Co-authors
- Wanli Ouyang 3
- Lei Bai 2
- Huajun Chen 2
- Tao Chen 2
- Shuyue Hu 2
- Zezhong Xu 2
- Shufei Zhang 2
- Wen Zhang 2
- Michael Bloodgood 1
- Jianjian Cao 1
- Hui Chen 1
- David Doermann 1
- Nanqing Dong 1
- Yuan Dong 1
- Shi Feng 1
- Haonan He 1
- Jiale Hong 1
- Junxian Li 1
- Yuqiang Li 1
- Lei Liang 1
- Weihao Lin 1
- Yuchen Ren 1
- Paul Rodrigues 1
- Shengji Tang 1
- Yining Tang 1
- Ziyang Xu 1
- Minghao Yang 1
- Xiaocui Yang 1
- David Zajic 1
- Bo Zhang 1
- Di Zhang 1
- Yiqun Zhang 1
- Meng Zhao 1
- Dongzhan Zhou 1