Jie Zhang

Other people with similar names: Jie Zhang, Jie Zhang, Jie Zhang, Jie Zhang

Unverified author pages with similar names: Jie Zhang


2026

Large Language Model-based Multi-Agent Systems represent a promising paradigm for tackling complex problems through agent collaboration. However, the reliance on open-ended communication exposes a fundamental vulnerability: the collaborative process itself can be exploited and disrupted. In this work, we formalize this threat class as Denial-of-Collaboration (DoC). Unlike DoS, which targets individual nodes or services, DoC attacks corrupt the collaborative structure of the system, transforming its communication topology into self-sabotage. The result is excessive resource consumption and eventual system paralysis. We introduce **CO**ntagious **R**ecursive **B**locking **A**ttacks (CORBA) as a concrete example of DoC, which employs benign yet recursively contagious instructions, forcing LLM-MASs into cycles of meaningless message passing. Critically, since our attacks are semantically benign, they easily bypass conventional safety alignments that are not designed to detect behavioral or systemic attacks. Through extensive experiments across diverse topologies and models, we demonstrate that CORBA achieves system paralysis where the baseline attacks fail. Our work reveals emerging DoC threats in current LLM-MAS security and establishes a crucial baseline for developing robust, collaboration-aware defense mechanisms.

2025

Text sanitization, which employs differential privacy to replace sensitive tokens with new ones, represents a significant technique for privacy protection. Typically, its performance in preserving privacy is evaluated by measuring the attack success rate (ASR) of reconstruction attacks, where attackers attempt to recover the original tokens from the sanitized ones. However, current reconstruction attacks on text sanitization are developed empirically, making it challenging to accurately assess the effectiveness of sanitization. In this paper, we aim to provide a more accurate evaluation of sanitization effectiveness. Inspired by the works of Palamidessi et al., we implement theoretically optimal reconstruction attacks targeting text sanitization. We derive their bounds on ASR as benchmarks for evaluating sanitization performance. For real-world applications, we propose two practical reconstruction attacks based on these theoretical findings. Our experimental results underscore the necessity of reassessing these overlooked risks. Notably, one of our attacks achieves a 46.4% improvement in ASR over the state-of-the-art baseline, with a privacy budget of 𝜖=4.0 on the SST-2 dataset. Our code is available at: https://github.com/mengtong0110/On-the-Vulnerability-of-Text-Sanitization.