Bin Chen

Other people with similar names: Bin Chen, Bin Chen, Bin Chen

Unverified author pages with similar names: Bin Chen

2026

Key-Value (KV) caching is widely used in large language models (LLMs) to enable long-context inference efficiently, yet its security implications remain underexplored. We present the first systematic study of how KV cache compression interacts with jailbreak attacks, evaluating four model families under diverse jailbreak attacks. We identify a double-edged effect: (i) on one hand, compression can induce **Accidental Robustness**, where optimization-based and encoding-based attacks fail due to Malicious Semantic Eviction, where attacks’ own attention redirection reduces the malicious query’s cache importance, and Gradient Mismatch where discrete compression operations break jailbreak optimization. (ii) On the other hand, **Vulnerability Paradox** arises under merging-based compression for human-designed Attacks, where aggressive merging in shallow layers triggers functional head collapse, amplifying attack success rates. To address this, we propose **Safe-CAM**, a history-aware, per-head feedback merging strategy that prevents safety degradation while maintaining efficiency. Experiments show Safe-CAM fully restores safety (0% ASR) and improves benign task performance with minimal overhead. Our study highlights that KV cache compression is not only an efficiency mechanism but also a safety-critical design factor in LLM deployment.

pdf bib abs

Diffusion models (DMs) have recently exhibited impressive generation capability. However, their training generally requires huge computational resources and large-scale datasets. To solve these, recent studies empower DMs with Retrieval-Augmented Generation (RAG), yielding retrieval-augmented diffusion models (RDMs) that enhance performance with reduced parameters. Despite the success, RAG may introduce novel security issues that warrant further investigation. In this paper, we propose BadRDM, the first poisoning framework targeting RDMs, to systematically investigate their vulnerability to backdoor attacks. Our framework fully considers RAG’s characteristics by manipulating the retrieved items for specific text triggers to ultimately control the generated outputs. Specifically, we first insert a tiny portion of images into the retrieval database as target toxicity surrogates. We then exploit the contrastive learning mechanism underlying retrieval models by designing a malicious variant that establishes robust shortcuts from triggers to toxicity surrogates. In addition, we introduce novel entropy-based selection and generative augmentation strategies for better toxicity surrogates. Extensive experiments on two mainstream tasks show that the proposed method achieves outstanding attack effects while preserving benign utility. Notably, BadRDM remains effective even under common defense strategies, further highlighting serious security concerns for RDMs.

pdf bib abs

Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated text and constructing relevant datasets. However, in the domain of Chinese corpora, challenges remain, including limited model diversity and data homogeneity. To address these issues, we propose C-ReD: a comprehensive Chinese Real-prompt AI-generated text Detection benchmark. Experiments demonstrate that C-ReD not only enables reliable in-domain detection but also supports strong generalization to unseen LLMs and external Chinese datasets—addressing critical gaps in model diversity, domain coverage, and prompt realism that have limited prior Chinese detection benchmarks. We release our resources at https://github.com/HeraldofLight/C-ReD.

pdf bib abs

While multimodal large language models have demonstrated impressive short-term reasoning, they struggle with long-horizon video understanding due to limited context windows and static memory mechanisms that fail to mirror human cognitive efficiency. Existing paradigms typically fall into two extremes: vision-centric methods that incur high latency and redundancy through dense visual accumulation, or text-centric approaches that suffer from detail loss and hallucination via aggressive captioning. To bridge this gap, we propose **MM-Mem**, a pyramidal multimodal memory architecture grounded in *Fuzzy-Trace Theory*. **MM-Mem** structures memory hierarchically into a *Sensory Buffer*, *Episodic Stream*, and *Symbolic Schema*, enabling the progressive distillation of fine-grained perceptual traces (*verbatim*) into high-level semantic schemas (*gist*).Furthermore, to govern the dynamic construction of memory, we derive a Semantic Information Bottleneck objective and introduce SIB-GRPO to optimize the trade-off between memory compression and task-relevant information retention.In inference, we design an entropy-driven top-down memory retrieval strategy.Extensive experiments across 4 benchmarks confirm that **MM-Mem** achieves state-of-the-art performance on both offline and streaming tasks, demonstrating robust generalization and validating the effectiveness of cognition-inspired memory organization.Code and associated configurations are publicly available at ‘https://github.com/EliSpectre/MM-Mem‘.

pdf bib abs

3D Vision-Language Models (3D-VLMs) have emerged as the critical cognitive backbone for spatial intelligence, enabling precise reasoning over unstructured 3D data. While these models serve as the foundation for downstream robotics and embodied systems, their reliance on autoregressive decoding introduces a fundamental vulnerability regarding inference efficiency. In this work, we present Inflate3D, a novel adversarial framework designed to trigger computational and economic exhaustion in 3D-VLMs. Specifically, we exploit the model’s sensitivity to untrusted 3D assets to hijack the generation process. Inflate3D operates by injecting imperceptible noise that forces the model into a state of pathological verbosity, effectively stalling the inference pipeline. Our approach comprises two synergistic strategies: (1) a semantic-aware adversarial manipulation that leverages internal representations to selectively perturb semantically critical regions while preserving geometric structure, and (2) a trajectory disruption mechanism that manipulates token probabilities to suppress End-of-Sequence (EOS) emission, thereby prolonging decoding and inducing verbose outputs. Experiments on standard benchmarks show that Inflate3D amplifies output length and energy consumption by up to 6.45×, demonstrating a potent capability to drain system resources. These findings expose a critical blind spot in multimodal alignment, highlighting the urgent need to secure spatial foundation models against resource exhaustion attacks.

pdf bib abs

Vision-Language Models (VLMs) encode images and videos into abundant tokens, which contain substantial redundancy and computation cost. While visual token pruning mitigates the issue, most existing methods lack insight into the intrinsic property of the vision encoder itself. In this work, we dive into the vision encoder and prove that the middle layers pay more attention to the main objects of the image qualitatively and quantitatively, while the deep layers to tokens with rich global information. Utilizing this Hierarchical attention pattern, we propose HiPrune, a training-free and model-agnostic token Pruning method. HiPrune identifies three types of visual tokens according to their attention in different phases of the vision encoder, which preserves different levels of information. By coupling with the similarity of text tokens, we propose a prompt-aware variance, HiPrune++, which further improves instruction following performance under a very low token budget. Extensive experiments across four representative VLMs show that HiPrune achieves up to 99.3% of task accuracy with only 1/3 of the tokens, while reducing inference FLOPs by 58.7%. HiPrune++ maintains up to 99.9% accuracy with 2/9 tokens, highlighting robustness under high-resolution.

2025

pdf bib abs

The misuse of large language models (LLMs), such as academic plagiarism, has driven the development of detectors to identify LLM-generated texts. To bypass these detectors, paraphrase attacks have emerged to purposely rewrite these texts to evade detection. Despite the success, existing methods require substantial data and computational budgets to train a specialized paraphraser, and their attack efficacy greatly reduces when faced with advanced detection algorithms. To address this, we propose Contrastive Paraphrase Attack (CoPA), a training-free method that effectively deceives text detectors using off-the-shelf LLMs. The first step is to carefully craft instructions that encourage LLMs to produce more human-like texts. Nonetheless, we observe that the inherent statistical biases of LLMs can still result in some generated texts carrying certain machine-like attributes that can be captured by detectors. To overcome this, CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by the LLM. By subtracting the machine-like patterns from the human-like distribution during the decoding process, CoPA is able to produce sentences that are less discernible by text detectors. Our theoretical analysis suggests the superiority of the proposed attack. Extensive experiments validate the effectiveness of CoPA in fooling text detectors across various scenarios.

pdf bib abs

The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at https://github.com/creator-xi/MoSEs.

pdf bib abs

Composed Image Retrieval (CIR) enables users to search for images using multimodal queries that combine text and reference images. While metric learning methods have shown promise, they rely on deterministic point embeddings that fail to capture the inherent uncertainty in the input data, in which user intentions may be imprecisely specified or open to multiple interpretations. We address this challenge by reformulating CIR through our proposed Composed Probabilistic Embedding (CoPE) framework, which represents both queries and targets as Gaussian distributions in latent space rather than fixed points. Through careful design of probabilistic distance metrics and hierarchical learning objectives, CoPE explicitly captures uncertainty at both instance and feature levels, enabling more flexible, nuanced, and robust matching that can handle polysemy and ambiguity in search intentions. Extensive experiments across multiple benchmarks demonstrate that CoPE effectively quantifies both quality and semantic uncertainties within Composed Image Retrieval, achieving state-of-the-art performance on recall rate. Code: https://github.com/tanghme0w/ACL25-CoPE.