Tianle Zhang

2026

The Mark Fades: Adaptive Evolutionary Paraphrase-based Attack against LLM Watermarks
Yusheng Zhao | Jian Zhao | Tianle Zhang | Feng Wei | Xuelong Li
Findings of the Association for Computational Linguistics: ACL 2026

While LLM watermarking is essential for machine- generated content identification, existing paraphrase-based attacks struggle to balance watermark removal efficacy with text quality. We propose TSAPA, a training-free evolutionary framework that models watermark removal as a constrained multi-objective optimization problem. By leveraging genetic algorithms to navigate the Pareto front, TSAPA utilizes a Pseudo-Log-Likelihood (PLL)-guided mutation to precisely target and modify watermark-carrying tokens. Experiments on Qwen3 series (1.7B/8B/32B) across multiple watermark schemes show that TSAPA achieves over 90% attack success rate (ASR) while maintaining high text semantic fidelity, significantly outperforming baselines methods. This work exposes critical vulnerabilities in current watermarks and provides a new perspective for robust evaluation.

pdf bib abs

Multimodal Large Language Models (MLLMs) frequently hallucinate due to their reliance on fragile, linear reasoning and weak visual grounding. We propose Visual Attention Reasoning (VAR), a reinforcement learning framework that reformulates reasoning as a hierarchical search with self-verification. VAR enforces traceable evidence grounding by generating explicit bounding boxes, guided by a novel reward function combining geometric precision and semantic sufficiency. Furthermore, it replaces linear Chain-of-Thought with a tree-search policy capable of backtracking to correct logical errors. Theoretical analysis validates the framework’s reliability, and extensive experiments demonstrate that VAR significantly outperforms state-of-the-art methods on complex hallucination and safety benchmarks.

Co-authors

Yuchen Yuan 1

Yusheng Zhao 1

Ming Zhu 1

Venues

ACL1
Findings1

Fix author