Yiwei Dai
2026
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
Rui Miao | Yixin Liu | Yili Wang | Xu Shen | Yue Tan | Yiwei Dai | Shirui Pan | Xin Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Rui Miao | Yixin Liu | Yili Wang | Xu Shen | Yue Tan | Yiwei Dai | Shirui Pan | Xin Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The security of LLM-based multi-agent systems (MAS) is critically threatened by propagation vulnerability, where malicious agents can distort collective decision-making through inter-agent interactions. While existing supervised defense methods demonstrate promising performance, they may be impractical in real-world scenarios due to their heavy reliance on labeled malicious agents to train a supervised malicious detection model. To enable practical and generalizable MAS defenses, in this paper, we propose BlindGuard, an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors. To this end, we establish a hierarchical agent encoder to capture individual, neighborhood, and global interaction patterns of each agent, providing a comprehensive understanding for malicious agent detection. Meanwhile, we design a corruption-guided detector that consists of directional noise injection and contrastive learning, allowing effective detection model training solely on normal agent behaviors. Extensive experiments show that BlindGuard effectively detects diverse attack types across MAS with various communication patterns while maintaining superior generalizability compared to supervised baselines.
MoEC: A Memory-Routed Mixture-of-Experts Controller for Adaptive Minecraft Control
Hui Wu | Chao Xu | Jianghui Wang | Ziqiong Liu | Dong Li | Yiwei Dai | Emad Barsoum
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hui Wu | Chao Xu | Jianghui Wang | Ziqiong Liu | Dong Li | Yiwei Dai | Emad Barsoum
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Embodied agents in open-ended environments such as Minecraft increasingly adopt planner–controller architectures, with large language models acting as high-level planners. While planning has advanced rapidly, control remains underexplored. Existing systems commonly rely on a monolithic policy to execute subgoals across varying contexts, forcing incompatible behaviors into a shared parameter space and causing interference that scaling only partially mitigates. To address this, we propose MoEC, a Memory-Routed Mixture-of-Experts Controller for Adaptive Minecraft Control. MoEC routes via a subgoal-indexed, non-parametric expert memory and regulates capacity through failure-triggered expert growth and redundancy-aware consolidation. This design enables continual adaptation without full retraining, while maintaining parameter efficiency and with bounded inference cost. We evaluate MoEC on diverse and compositional Minecraft tasks, demonstrating significant gains in adaptability, robustness, and execution consistency over strong baselines, yielding a scalable and efficient alternative for open-ended control.
2025
Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems
Xu Shen | Yixin Liu | Yiwei Dai | Yili Wang | Rui Miao | Yue Tan | Shirui Pan | Xin Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xu Shen | Yixin Liu | Yiwei Dai | Yili Wang | Rui Miao | Yue Tan | Shirui Pan | Xin Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The communication topology in large language model-based multi-agent systems fundamentally governs inter-agent collaboration patterns, critically shaping both the efficiency and effectiveness of collective decision-making. While recent studies for communication topology automated design tend to construct sparse structures for efficiency, they often overlook why and when sparse and dense topologies help or hinder collaboration. In this paper, we present a causal framework to analyze how agent outputs, whether correct or erroneous, propagate under topologies with varying sparsity. Our empirical studies reveal that moderately sparse topologies, which effectively suppress error propagation while preserving beneficial information diffusion, typically achieve optimal task performance. Guided by this insight, we propose a novel topology design approach, EIB-Learner, that balances error suppression and beneficial information propagation by fusing connectivity patterns from both dense and sparse graphs. Extensive experiments show the superior effectiveness, communication cost, and robustness of EIB-Learner.
2024
Mitigate Extrinsic Social Bias in Pre-trained Language Models via Continuous Prompts Adjustment
Yiwei Dai | Hengrui Gu | Ying Wang | Xin Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yiwei Dai | Hengrui Gu | Ying Wang | Xin Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Although pre-trained language models (PLMs) have been widely used in natural language understandings (NLU), they are still exposed to fairness issues. Most existing extrinsic debiasing methods rely on manually curated word lists for each sensitive groups to modify training data or to add regular constraints. However, these word lists are often limited by length and scope, resulting in the degradation performance of extrinsic bias mitigation. To address the aforementioned issues, we propose a **C**ontinuous **P**rompts **A**djustment **D**ebiasing method (CPAD), which generates continuous token lists from the entire vocabulary space and uses them to bridge the gap between outputs and targets in fairness learning process. Specifically, CPAD encapsulates fine-tuning objective and debiasing objectives into several independent prompts. To avoid the limitation of manual word lists, in fairness learning phase, we extract outputs from the entire vocabulary space via fine-tuned PLM. Then, we aggregate the outputs from the same sensitive group as continuous token lists to map the outputs into protected attribute labels. Finally, after we learn the debiasing prompts in the perspective of adversarial learning, we improve fairness by adjusting continuous prompts at model inference time. Through extensive experiments on three NLU tasks, we evaluate the debiasing performance from the perspectives of group fairness and fairness through unawareness. The experimental results show that CPAD outperforms all baselines in term of single and two-attributes debiasing performance.