Ziliang Wang

2026

VulAgent: Hypothesis-Validation Driven Multi-Agent Architecture for Vulnerability Detection
Ziliang Wang | Ge Li | Jia Li | Hao Zhu | Zhi Jin
Findings of the Association for Computational Linguistics: ACL 2026

Vulnerability detection with language models is challenging: it requires (i) precisely localizing security-sensitive code and (ii) reasoning about potential vulnerability conditions under complex, partially observed program context. We present VulAgent, a multi-agent vulnerability detection framework based on hypothesis validation. Our design is inspired by how human auditors review code: when noticing a sensitive operation, they form a hypothesis about a possible vulnerability, consider potential trigger paths, and then verify the hypothesis against the project context. Given a code unit, VulAgent first applies multi-view analyzers to identify and localize security-sensitive operations from complementary perspectives. For each sensitive operation, it then constructs an explicit vulnerability hypothesis—including triggering (or exploitation) preconditions and a candidate trigger path—and validates the hypothesis using project context together with the model’s general knowledge of commonly used APIs and security patterns. This validation-oriented design reduces speculative reports and substantially lowers false positives. Across PrimeVul and SVEN, VulAgent improves accuracy by 6.6 percentage points on average, increases vulnerable–fixed pair identification by up to 4.5x (2.46x on average), and reduces false positive rate by 36% relative to recent LLM-based baselines.

2025

pdf bib abs

StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
Xuhui Zheng | Kang An | Ziliang Wang | Yuhang Wang | Yichao Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Efficient multi-hop reasoning requires Large Language Models (LLMs) based agents to acquire high-value external knowledge iteratively. Previous work has explored reinforcement learning (RL) to train LLMs to perform search-based document retrieval, achieving notable improvements in QA performance, but underperform on complex, multi-hop QA resulting from the sparse rewards from global signal only. To address this gap in existing research, we introduce StepSearch, a framework for search LLMs that trained with step-wise proximal policy optimization method. It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties to better guide each search step. We constructed a fine-grained question-answering dataset containing sub-question-level search trajectories based on open source datasets through a set of data pipeline method. On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models over various search with RL baselines using only 19k training data, demonstrating the effectiveness of fine-grained, stepwise supervision in optimizing deep search LLMs. The project is open source at https://github.com/Zillwang/StepSearch

Co-authors

Yichao Wu 1

Xuhui Zheng 1

Hao Zhu 1

Venues

EMNLP1
Findings1

Fix author