Zhewei Wei
2026
MotifAgent: Learning Molecular Assembly through Multi-Agent Collaboration for Chemical Language Understanding
Jinjia Feng | Wenda Wang | Zhewei Wei
Findings of the Association for Computational Linguistics: ACL 2026
Jinjia Feng | Wenda Wang | Zhewei Wei
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have shown great potential in molecular understanding by aligning molecular representations with text. However, existing approaches remain limited to static motif recognition without comprehending the generative principles—the connection rules governing how motifs assemble into valid topological structures. To address this challenge, we introduce **MotifAgent**, a multi-agent reinforcement learning framework inspired by emergent collective intelligence. We formulate molecular assembly as a collaborative problem where each motif is represented by an agent sharing a common LLM backbone, learning connection rules through explicit inter-motif negotiation rather than implicit sequence memorization. Key innovations include: (1) dynamic inter-agent negotiation for modeling motif connections; (2) Set-based Behavioral Cloning for learning multiple topologically equivalent assembly paths; (3) topology-aware reward shaping with MAPPO to maintain chemical validity while optimizing target properties. Extensive experiments demonstrate that MotifAgent achieves state-of-the-art performance across molecular property prediction, description generation, and reaction prediction tasks, with our generalist model surpassing specialized expert models.
GRAPHIA: Harnessing Social Graph Data to Enhance LLM-Based Social Simulation
Jiarui Ji | Zehua Zhang | Zhewei Wei | Bin Tong | Guan Wang | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiarui Ji | Zehua Zhang | Zhewei Wei | Bin Tong | Guan Wang | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have shown promise in simulating human-like social behaviors. Social graphs provide high-quality supervision signals that encode both local interactions and global network structure, yet they remain underutilized for LLM training. To address this gap, we propose Graphia, the first general LLM-based social graph simulation framework that leverages graph data as supervision for LLM post-training via reinforcement learning. With GNN-based structural rewards, Graphia trains specialized agents to predict whom to interact with (destination selection) and how to interact (edge generation), followed by designed graph generation pipelines. We evaluate Graphia under two settings: Transductive Dynamic Graph Generation (TDGG), a micro-level task with our proposed node-wise interaction alignment metrics; and Inductive Dynamic Graph Generation (IDGG), a macro-level task with our proposed metrics for aligning emergent network properties. On three real-world networks, Graphia improves micro-level alignment by 6.1% in the composite destination selection score, 12% in edge classification accuracy, and 27.9% in edge content BERTScore over the strongest baseline. For macro-level alignment, it achieves 35.98% higher structural similarity and 28.71% better replication of social phenomena such as power laws and echo chambers. Our results show that social graphs can serve as high-quality supervision signals for LLM post-training, closing the gap between agent behaviors and network dynamics for LLM-based simulation. Code is available at https://github.com/Ji-Cather/Graphia.git.
2025
LLM-Based Multi-Agent Systems are Scalable Graph Generative Models
Jiarui Ji | Runlin Lei | Jialing Bi | Zhewei Wei | Xu Chen | Yankai Lin | Xuchen Pan | Yaliang Li | Bolin Ding
Findings of the Association for Computational Linguistics: ACL 2025
Jiarui Ji | Runlin Lei | Jialing Bi | Zhewei Wei | Xu Chen | Yankai Lin | Xuchen Pan | Yaliang Li | Bolin Ding
Findings of the Association for Computational Linguistics: ACL 2025
The structural properties of naturally arising social graphs are extensively studied to understand their evolution. Prior approaches for modeling network dynamics typically rely on rule-based models, which lack realism and generalizability, or deep learning-based models, which require large-scale training datasets. As abstract graph representations of entity-wise interactions, social graphs present an opportunity to explore network evolution mechanisms through realistic simulations of human-item interactions. Leveraging the pre-trained social consensus knowledge embedded in large language models (LLMs), we present GraphAgent-Generator (GAG), a novel simulation-based framework for dynamic, text-attributed social graph generation. GAG simulates the temporal node and edge generation processes for zero-shot social graph generation. The resulting graphs adhere to seven key macroscopic network properties, achieving an 11% improvement in microscopic graph structure metrics. Through the node classification benchmarking task, we validate that GAG effectively captures the intricate text-structure correlations in graph generation. Furthermore, GAG supports generating graphs with up to nearly 100,000 nodes or 10 million edges through large-scale LLM-based agent simulation with parallel acceleration, achieving a minimum speed-up of 90.4%. The source code is available at https://github.com/Ji-Cather/GraphAgent.
Towards Effective and Efficient Continual Pre-training of Large Language Models
Jie Chen | Zhipeng Chen | Jiapeng Wang | Kun Zhou | Yutao Zhu | Jinhao Jiang | Yingqian Min | Wayne Xin Zhao | Zhicheng Dou | Jiaxin Mao | Yankai Lin | Ruihua Song | Jun Xu | Xu Chen | Rui Yan | Zhewei Wei | Di Hu | Wenbing Huang | Ji-Rong Wen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jie Chen | Zhipeng Chen | Jiapeng Wang | Kun Zhou | Yutao Zhu | Jinhao Jiang | Yingqian Min | Wayne Xin Zhao | Zhicheng Dou | Jiaxin Mao | Yankai Lin | Ruihua Song | Jun Xu | Xu Chen | Rui Yan | Zhewei Wei | Di Hu | Wenbing Huang | Ji-Rong Wen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. In this paper, we comprehensively study its key designs to balance the new abilities while retaining the original abilities, and present an effective CPT method that can greatly improve the Chinese language ability and scientific reasoning ability of LLMs. To achieve it, we design specific data mixture and curriculum strategies based on existing datasets and synthetic high-quality data. Concretely, we synthesize multidisciplinary scientific QA pairs based on related web pages to guarantee the data quality, and also devise the performance tracking and data mixture adjustment strategy to ensure the training stability. For the detailed designs, we conduct preliminary studies on a relatively small model, and summarize the findings to help optimize our CPT method. Extensive experiments on a number of evaluation benchmarks show that our approach can largely improve the performance of Llama-3 (8B), including both the general abilities (+8.81 on C-Eval and +6.31 on CMMLU) and the scientific reasoning abilities (+12.00 on MATH and +4.13 on SciEval). Our model, data, and codes are available at https://github.com/RUC-GSAI/Llama-3-SynE.
2024
SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent
Jiarui Ji | Yang Li | Hongtao Liu | Zhicheng Du | Zhewei Wei | Qi Qi | Weiran Shen | Yankai Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
Jiarui Ji | Yang Li | Hongtao Liu | Zhicheng Du | Zhewei Wei | Qi Qi | Weiran Shen | Yankai Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
Public scarce resource allocation plays a crucial role in economics as it directly influences the efficiency and equity in society. Traditional studies including theoretical model-based, empirical study-based and simulation-based methods encounter limitations due to the idealized assumption of complete information and individual rationality, as well as constraints posed by limited available data. In this work, we propose an innovative framework, SRAP-Agent, which integrates Large Language Models (LLMs) into economic simulations, aiming to bridge the gap between theoretical models and real-world dynamics. Using public housing allocation scenarios as a case study, we conduct extensive policy simulation experiments to verify the feasibility and effectiveness of the SRAP-Agent and employ the Policy Optimization Algorithm with certain optimization objectives. The source code can be found in https://github.com/jijiarui-cather/SRAPAgent_Framework.
Search
Fix author
Co-authors
- Jiarui Ji 3
- Yankai Lin (林衍凯) 3
- Xu Chen 2
- Jialing Bi 1
- Jie Chen 1
- Zhipeng Chen 1
- Bolin Ding 1
- Zhicheng Dou (窦志成) 1
- Zhicheng Du 1
- Jinjia Feng 1
- Di Hu 1
- Wenbing Huang 1
- Jinhao Jiang 1
- Runlin Lei 1
- Yaliang Li 1
- Yang Li 1
- Hongtao Liu 1
- Jiaxin Mao 1
- Yingqian Min 1
- Xuchen Pan 1
- Qi Qi 1
- Weiran Shen 1
- Ruihua Song 1
- Bin Tong 1
- Wenda Wang 1
- Jiapeng Wang 1
- Guan Wang 1
- Ji-Rong Wen 1
- Jun Xu 1
- Rui Yan 1
- Zehua Zhang 1
- Wayne Xin Zhao 1
- Bo Zheng 1
- Kun Zhou 1
- Yutao Zhu (朱余韬) 1