Cheng Yang
Other people with similar names: Cheng Yang, Cheng Yang
Unverified author pages with similar names: Cheng Yang
2026
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing
Yang Liu | Jinxuan Cai | Yishen Li | Qi Meng | Zedi Liu | Xin Li | Chen Qian | Chuan Shi | Cheng Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Yang Liu | Jinxuan Cai | Yishen Li | Qi Meng | Zedi Liu | Xin Li | Chen Qian | Chuan Shi | Cheng Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Large language model-based (LLM-based) multi-agent systems (MAS) are increasingly used to extend agentic problem solving via role specialization and collaboration. MAS workflows can be naturally modeled as directed computation graphs, where nodes execute agents or sub-workflows and edges encode dependencies and message passing. However, implementing complex graph workflows in current frameworks still requires substantial manual effort, offers limited reuse, and makes it difficult to integrate heterogeneous external context sources. To overcome these limitations, we present MASFactory, a graph-centric framework for orchestrating LLM-based MAS. It introduces Vibe Graphing, a human-in-the-loop approach that compiles natural-language intent into an editable workflow specification and then into an executable graph. In addition, the framework provides reusable components, skill support, multimodal message handling, and pluggable context integration, as well as a visualizer for topology preview, runtime tracing, and human-in-the-loop interaction. We evaluate MASFactory on seven public benchmarks, validating both reproduction consistency for representative MAS methods and the effectiveness of Vibe Graphing. Our code (https://github.com/BUPT-GAMMA/MASFactory, licensed under Apache-2.0) and video demonstration (https://youtu.be/ANynzVfY32k) are publicly available.
2025
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun | Weilin Zhao | Xu Han | Cheng Yang | Xinrong Zhang | Zhiyuan Liu | Chuan Shi | Maosong Sun
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Ao Sun | Weilin Zhao | Xu Han | Cheng Yang | Xinrong Zhang | Zhiyuan Liu | Chuan Shi | Maosong Sun
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Training large language models (LLMs) heavily relies on distributed training strategies, among which pipeline parallelism (PP) plays a crucial role. As training sequences extend to 32k or even 128k tokens, current PP methods face severe bottlenecks, including substantial pipeline bubbles and high memory footprint, greatly hindering training throughput and model scalability. This paper introduces a sequence-level one-forward-one-backward (1F1B) PP method, named Seq1F1B, tailored for training LLMs on long sequences with high training throughput and memory efficiency. Unlike typical PP methods, which adopt batch-level pipeline schedule, Seq1F1B schedules the pipeline of training LLMs at the sequence level. It uses a computational strategy to partition sequences appropriately, significantly reducing pipeline bubbles and memory footprint. Compared to competitive PP baselines such as Megatron 1F1B PP, Seq1F1B achieves 1.14X training throughput with half memory footprint.Notably, Seq1F1B trains an LLM with 30B parameters on sequences up to 64k tokens using 64X NVIDIA A100 GPUs without using recomputation strategies, a feat unachievable with existing methods.We have released our code on GitHub to facilitate further research and development in LLM training on long sequences: https://github.com/thunlp/Seq1F1B.
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
Weize Chen | Jiarui Yuan | Chen Qian | Cheng Yang | Zhiyuan Liu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
Weize Chen | Jiarui Yuan | Chen Qian | Cheng Yang | Zhiyuan Liu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through training. Optima employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability. We explore various algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs. We integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, Optimashows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B / 3.2 3B, achieving up to 2.8x performance gain with less than 10% tokens on tasks requiring heavy information exchange. Moreover, Optima’s efficiency gains enable more effective compute utilization during inference, leading to improved inference-time scaling laws. By addressing fundamental challenges in LLM-based MAS, Optima shows the potential towards scalable, efficient, and effective MAS.
Multi-Agent Collaboration via Cross-Team Orchestration
Zhuoyun Du | Chen Qian | Wei Liu | Zihao Xie | YiFei Wang | Rennai Qiu | Yufan Dang | Weize Chen | Cheng Yang | Ye Tian | Xuantang Xiong | Lei Han
Findings of the Association for Computational Linguistics: ACL 2025
Zhuoyun Du | Chen Qian | Wei Liu | Zihao Xie | YiFei Wang | Rennai Qiu | Yufan Dang | Weize Chen | Cheng Yang | Ye Tian | Xuantang Xiong | Lei Han
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) have significantly impacted various domains, especially through organized LLM-driven autonomous agents. A representative scenario is in software development, where agents can collaborate in a team like humans, following predefined phases to complete sub-tasks sequentially. However, for an agent team, each phase yields only one possible outcome. This results in the completion of only one development chain, thereby losing the opportunity to explore multiple potential decision paths within the solution space. Consequently leading to suboptimal results or extensive trial and error. To address this, we introduce Cross-Team Orchestration (Croto), a scalable multi-team framework that enables orchestrated teams to jointly propose various task-oriented solutions and interact with their insights in a self-independence while cross-team collaboration environment for superior solutions generation. Experiments reveal a notable increase in software quality compared to state-of-the-art baselines. We further tested our framework on story generation tasks, which demonstrated a promising generalization ability of our framework in other domains. The code and data is available at https://github.com/OpenBMB/ChatDev/tree/macnet
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu | Xinze Li | Zhenghao Liu | Yukun Yan | Cheng Yang | Zheni Zeng | Zhiyuan Liu | Maosong Sun | Ge Yu
Findings of the Association for Computational Linguistics: ACL 2025
Shuliang Liu | Xinze Li | Zhenghao Liu | Yukun Yan | Cheng Yang | Zheni Zeng | Zhiyuan Liu | Maosong Sun | Ge Yu
Findings of the Association for Computational Linguistics: ACL 2025
Retrieval-Augmented Generation (RAG) has proven its effectiveness in alleviating hallucinations for Large Language Models (LLMs). However, existing automated evaluation metrics cannot fairly evaluate the outputs generated by RAG models during training and evaluation. LLM-based judgment models provide the potential to produce high-quality judgments, but they are highly sensitive to evaluation prompts, leading to inconsistencies when judging the output of RAG models. This paper introduces the Judge-Consistency (ConsJudge) method, which aims to enhance LLMs to generate more accurate evaluations for RAG models. Specifically, ConsJudge prompts LLMs to generate different judgments based on various combinations of judgment dimensions, utilizes the judge-consistency to evaluate these judgments, and selects the chosen and rejected judgments for DPO training. Our experiments show that ConsJudge can effectively provide more accurate judgments for optimizing RAG models across various RAG models and datasets. Further analysis reveals that judgments generated by ConsJudge have a high agreement with the superior LLM. All codes are available at https://github.com/OpenBMB/ConsJudge.
Search
Fix author
Co-authors
- Zhiyuan Liu 3
- Chen Qian 3
- Maosong Sun (孙茂松) 3
- Weize Chen 2
- Chuan Shi 2
- Jinxuan Cai 1
- Yufan Dang 1
- Zhuoyun Du 1
- Lei Han 1
- Xu Han 1
- Xin Li 1
- Xinze Li 1
- Yishen Li 1
- Shuliang Liu 1
- Wei Liu 1
- Yang Liu 1
- Zedi Liu 1
- Zhenghao Liu (刘正皓) 1
- Qi Meng 1
- Rennai Qiu 1
- Ao Sun 1
- Ye Tian 1
- Yifei Wang 1
- Zihao Xie 1
- Xuantang Xiong 1
- Yukun Yan (闫宇坤) 1
- Ge Yu (于戈) 1
- Jiarui Yuan 1
- Zheni Zeng 1
- Xinrong Zhang 1
- Weilin Zhao 1