Pengfei Yang

2026

Integrating textual graphs into Large Language Models (LLMs) is promising for complex graph-based QA. However, a key bottleneck is retrieving informative yet compact subgraphs that fit the LLM context. Existing retrievers often struggle, relying either on shallow embedding similarity or costly interactive policies that require excessive supervision.To address these challenges, we introduce Graph-S³, an agentic textual graph reasoning framework featuring an LLM-based retriever trained with synthetic stepwise supervision. Rather than relying on final answer rewards—which often yield sparse and unstable signals—we optimize the retriever by evaluating each step against offline-extracted golden subgraphs.Our approach distills golden subgraphs via a specialized data synthesis pipeline to formulate dense rewards, facilitating a two-stage training scheme that effectively learns the interactive graph exploration policy.Based on extensive experiments on three common datasets in comparison with seven strong baselines, our approach achieves an average improvement of 15.6% in accuracy and 17.2% in F₁ score. The advantage is even higher in more complicated multi-hop reasoning tasks.

pdf bib abs

While prompt engineering enhances the capabilities of Large Language Models (LLMs), it also exposes critical safety concerns. Due to the inherent brittleness of their static safety boundaries, LLMs are vulnerable to jailbreak prompts, i.e. adversarial inputs designed to bypass safeguards and induce the generation of harmful content. Existing detection mechanisms rely on static model components or fixed decision thresholds, limiting their ability to generalize to evolving attack patterns and continual model updates. To bridge this gap, we propose RLShield, a dynamic jailbreak detection framework that employs reinforcement learning for adaptive threshold selection. RLShield incorporates three key innovations: (i) a dynamic retrieval and LLM-based rewriting module to simulate diverse adversarial contexts; (ii) a cross-layer representation analysis to pinpoint safety-critical parameters; and (iii) a Soft Actor-Critic (SAC) based agent that learns to predict optimal, sample-specific detection thresholds. Experimental results demonstrate that RLShield consistently outperforms state-of-the-art baselines in detection performance while maintaining high computational efficiency. Notably, it improves F1 by up to 7.3%, while achieving an average of 3× gain in inference efficiency across multiple LLM backbones.

Co-authors

Shu Wu 1

Venues

ACL1
Findings1

Fix author