Chongye Guo

2025

Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees.

Large Language Models (LLMs) have revolutionized language processing and understanding, yet their performance is hampered by inaccuracies and outdated information. Model editing techniques offer a solution but face two key challenges: **(I)** Most methods inject knowledge by constructing rigid loss, which leads to poor compatibility when dealing with higher-order multi-hop problems. **(II)** Locate-then-edit vein, by altering pre-trained parameters, inevitably affect normal knowledge and even face the catastrophic forgetting. In this paper, we introduce **KGMET**, a framework that constructs knowledge graphs using available information to guide the direction of knowledge editing, enabling **consistent**, **aligned**, and **stable** information during **large-scale** editing scenario. Furthermore, *KGMET* goes beyond this by employing orthogonal constraints to block the interference of irrelevant information, ensuring the updates are both controllable and generalizable. Experiments on Multi-Conterfact, ZsRE, and MQuAKE datasets using *Llama-3-8B*, *GPT-J-6B*, and *GPT-2-XL* models showcase improvements over state-of-the-art methods, with ↑ 5%-17% in multi-hop tasks while remaining generalizable (at least ↑ 20% in fluency). Our code is available on Github.

Co-authors

Miao Yu 1

Venues

acl1
findings1

Fix author