Shijing Hu


2026

In LLM-based Text-to-SQL systems, unanswerable and underspecified user queries may generate not only incorrect text but also executable programs that yield misleading results or violate safety constraints, thus posing a major barrier to safe deployment. Existing refusal strategies for such queries either rely on output-level instruction following, which is brittle due to model hallucinations, or on estimating output uncertainty, which adds complexity and overhead. To address this challenge, we first formalize safe refusal in Text-to-SQL systems as an answerability-gating problem, and then propose **LatentRefusal**, a latent-signal refusal mechanism that predicts query answerability from intermediate hidden activations of an LLM. We introduce the Tri-Residual Gated Encoder (TRGE), a lightweight probing architecture, to suppress schema noise and amplify sparse, localized question–schema mismatch cues that indicate unanswerability. Extensive empirical evaluations across diverse ambiguous and unanswerable settings, together with ablations and interpretability analyses, demonstrate the effectiveness of the proposed scheme and show that **LatentRefusal** provides an attachable, efficient safety layer for Text-to-SQL systems. Across four benchmarks, **LatentRefusal** achieves an average F1 of 88.5% and 88.8% on Llama-3.1-8B and Qwen-3-8B respectively, while adding ~2ms probe overhead.