Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems

Đorđe Klisura; Anthony Rios

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems

Abstract

Text-to-SQL systems empower users to interact with databases using natural language, automatically translating queries into executable SQL code. However, their reliance on database schema information for SQL generation exposes them to significant security vulnerabilities, particularly schema inference attacks that can lead to unauthorized data access or manipulation. In this paper, we introduce a novel zero-knowledge framework for reconstructing the underlying database schema of text-to-SQL models without any prior knowledge of the database. Our approach systematically probes text-to-SQL models with specially crafted questions and leverages a surrogate GPT-4 model to interpret the outputs, effectively uncovering hidden schema elements—including tables, columns, and data types. We demonstrate that our method achieves high accuracy in reconstructing table names, with F1 scores of up to .99 for generative models and .78 for fine-tuned models, underscoring the severity of schema leakage risks. We also show that our attack can steal prompt information in non-text-to-SQL models. Furthermore, we propose a simple protection mechanism for generative models and empirically show its limitations in mitigating these attacks.

Anthology ID:: 2025.findings-naacl.386
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6954–6976
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.386/
DOI:
Bibkey:
Cite (ACL):: Đorđe Klisura and Anthony Rios. 2025. Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6954–6976, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems (Klisura & Rios, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.386.pdf

PDF Cite Search Fix data