Xiaowen Ma
2026
Self-Evolving Multi-Agent Systems via Textual Backpropagation
Xiaowen Ma | Yunpu Ma | Chenyang Lin | Sikuan Yan | Jinhe Bi | Zixuan Cao | Yijun Tian | Volker Tresp | Hinrich Schuetze
Findings of the Association for Computational Linguistics: ACL 2026
Xiaowen Ma | Yunpu Ma | Chenyang Lin | Sikuan Yan | Jinhe Bi | Zixuan Cao | Yijun Tian | Volker Tresp | Hinrich Schuetze
Findings of the Association for Computational Linguistics: ACL 2026
Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. The proposed framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Sikuan Yan | Xiufeng Yang | Zuchao Huang | Ercong Nie | Zifeng Ding | Zonggen Li | Xiaowen Ma | Jinhe Bi | Kristian Kersting | Jeff Z. Pan | Hinrich Schuetze | Volker Tresp | Yunpu Ma
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sikuan Yan | Xiufeng Yang | Zuchao Huang | Ercong Nie | Zifeng Ding | Zonggen Li | Xiaowen Ma | Jinhe Bi | Kristian Kersting | Jeff Z. Pan | Hinrich Schuetze | Volker Tresp | Yunpu Ma
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B–14B).