Ruichen Zhang

2026

The Adaptive Interrogator: Detecting Trojan LLMs in Multi-Agent Systems via Evolved Conversational Strategies
Rana Muhammad Shahroz Khan | Ruichen Zhang | Zhen Tan | Charles Fleming | Tianlong Chen
Findings of the Association for Computational Linguistics: ACL 2026

While Large Language Model (LLM) safety has focused on single-agent, white-box settings, the adoption of Multi-Agent Systems (MAS) creates a critical blind spot: supply chain vulnerabilities in MAS ecosystems. These systems often rely on third-party agents accessed via black-box APIs, creating risks where attackers can embed hidden triggers to manipulate collective reasoning or outputs. Because internal weights are inaccessible, traditional white-box defenses fail to detect these threats. Consequently, a critical gap exists in auditing these systems for ”Trojan” agents, i.e., malicious models that behave normally until triggered by specific, often multi-turn, conversational contexts. To bridge this gap, we introduce the Conversational Trojan Unmasking System (CTUS), a black-box auditing framework that leverages an Evolutionary Algorithm (EA) to autonomously expose hidden threats. Drawing on social deduction mechanics, CTUS deploys a ”Judge” agent to evolve conversational probes that provoke Trojan agents into revealing their malicious nature without alerting benign peers. We validate CTUS across diverse architectures (Llama-2/3, Gemma, Mistral) and attack vectors (word, syntax, semantic, RLHF). Our results demonstrate that CTUS achieves superior detection rates (up to 100% in specific configurations). Furthermore, we conduct rigorous analyses to confirm the framework’s robustness, exhibiting negligible false positives on benign systems and stability across system configurations, establishing CTUS as a scalable safeguard for the multi-agent landscape.

2025

pdf bib abs

Foundation models for single-cell RNA sequencing (scRNA-seq) have shown promising capabilities in capturing gene expression patterns. However, current approaches face critical limitations: they ignore biological prior knowledge encoded in gene regulatory relationships and fail to leverage multi-omics signals that could provide complementary regulatory insights. In this paper, we propose GRNFormer, a new framework that systematically integrates multi-scale Gene Regulatory Networks (GRNs) inferred from multi-omics data into RNA foundation model training. Our framework introduces two key innovations. First, we introduce a pipeline for constructing hierarchical GRNs that capture regulatory relationships at both cell-type-specific and cell-specific resolutions. Second, we design a structure-aware integration framework that addresses the information asymmetry in GRNs through two technical advances: (1) A graph topological adapter using multi-head cross-attention to weight regulatory relationships dynamically, and (2) a novel edge perturbation strategy that perturb GRNs with biologically-informed co-expression links to augment graph neural network training. Comprehensive experiments have been conducted on three representative downstream tasks across multiple model architectures to demonstrate the effectiveness of GRNFormer. It achieves consistent improvements over state-of-the-art (SoTA) baselines: 3.6\\% increase in drug response prediction correlation, 9.6\\% improvement in single-cell drug classification AUC, and 1.1\\% average gain in gene perturbation prediction accuracy.

Co-authors

Venues

Findings2

Fix author