Tianle Zhang
2026
The Mark Fades: Adaptive Evolutionary Paraphrase-based Attack against LLM Watermarks
Yusheng Zhao | Jian Zhao | Tianle Zhang | Feng Wei | Xuelong Li
Findings of the Association for Computational Linguistics: ACL 2026
Yusheng Zhao | Jian Zhao | Tianle Zhang | Feng Wei | Xuelong Li
Findings of the Association for Computational Linguistics: ACL 2026
While LLM watermarking is essential for machine- generated content identification, existing paraphrase-based attacks struggle to balance watermark removal efficacy with text quality. We propose TSAPA, a training-free evolutionary framework that models watermark removal as a constrained multi-objective optimization problem. By leveraging genetic algorithms to navigate the Pareto front, TSAPA utilizes a Pseudo-Log-Likelihood (PLL)-guided mutation to precisely target and modify watermark-carrying tokens. Experiments on Qwen3 series (1.7B/8B/32B) across multiple watermark schemes show that TSAPA achieves over 90% attack success rate (ASR) while maintaining high text semantic fidelity, significantly outperforming baselines methods. This work exposes critical vulnerabilities in current watermarks and provides a new perspective for robust evaluation.
Visual Attention Reasoning via Hierarchical Search and Self-Verification
Wei Cai | Jian Zhao | Yuchen Yuan | Tianle Zhang | Ming Zhu | Haichuan Tang | Xuelong Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wei Cai | Jian Zhao | Yuchen Yuan | Tianle Zhang | Ming Zhu | Haichuan Tang | Xuelong Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal Large Language Models (MLLMs) frequently hallucinate due to their reliance on fragile, linear reasoning and weak visual grounding. We propose Visual Attention Reasoning (VAR), a reinforcement learning framework that reformulates reasoning as a hierarchical search with self-verification. VAR enforces traceable evidence grounding by generating explicit bounding boxes, guided by a novel reward function combining geometric precision and semantic sufficiency. Furthermore, it replaces linear Chain-of-Thought with a tree-search policy capable of backtracking to correct logical errors. Theoretical analysis validates the framework’s reliability, and extensive experiments demonstrate that VAR significantly outperforms state-of-the-art methods on complex hallucination and safety benchmarks.