Erlu Zhao

2026

Optimizing communication topology is fundamental to the efficiency and effectiveness of Large Language Model (LLM)-based Multi-Agent Systems (MAS). While recent approaches utilize reinforcement learning to dynamically construct task-specific graphs, they typically rely on single-sample policy gradients with absolute rewards (e.g., binary correctness). This paradigm suffers from severe gradient variance and the credit assignment problem: simple queries yield non-informative positive rewards for suboptimal structures, while difficult queries often result in failures that provide no learning signal. To address these challenges, we propose Graph-GRPO, a novel topology optimization framework that integrates Group Relative Policy Optimization. Instead of evaluating a single topology in isolation, Graph-GRPO samples a group of diverse communication graphs for each query and computes the advantage of specific edges based on their relative performance within the group. By normalizing rewards across the sampled group, our method effectively mitigates the noise derived from task difficulty variance and enables fine-grained credit assignment. Extensive experiments on reasoning and code generation benchmarks demonstrate that Graph-GRPO significantly outperforms state-of-the-art baselines, achieving superior training stability and identifying critical communication pathways previously obscured by reward noise.

2025

pdf bib abs

DINT Transformer
Yueyang Cang | Yuhang Liu | Xiaoteng Zhang | Erlu Zhao | Li Shi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The DIFF Transformer mitigates interference from irrelevant contexts by introducing a differential attention mechanism, thereby enhancing focus on critical tokens. However, this architecture suffers from two major limitations: first, its use of two independent attention matrices leads to numerical instability, and second, it lacks global context modeling, which is essential for identifying globally significant tokens. To address these challenges, we propose the DINT Transformer, which extends the DIFF Transformer by incorporating an integral mechanism. By computing global importance scores and integrating them into the attention matrix, the DINT Transformer not only improves overall numerical stability but also significantly enhances its ability to capture global dependencies. Experimental results demonstrate that the DINT Transformer achieves superior accuracy and robustness across various practical applications, including long-context language modeling and key information retrieval. These advancements establish the DINT Transformer as a highly effective and promising architecture.

Co-authors

Venues

EMNLP1
Findings1

Fix author