Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

Ryan Pavlich; Nima Ebadi; Richard Tarbell; Billy Linares; Adrian Tan; Rachael Humphreys; Jayanta Das; Rambod Ghandiparsi; Hannah Haley; Jerris George; Rocky Slavin; Kim-Kwang Raymond Choo; Glenn Dietrich; Anthony Rios

Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

Ryan Pavlich, Nima Ebadi, Richard Tarbell, Billy Linares, Adrian Tan, Rachael Humphreys, Jayanta Das, Rambod Ghandiparsi, Hannah Haley, Jerris George, Rocky Slavin, Kim-Kwang Raymond Choo, Glenn Dietrich, Anthony Rios

Abstract

Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. Existing research has generally focused on generating SQL statements from text queries, and the broader challenge lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably, temporal-related queries. Our dataset is sourced from a smart building’s IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data improves overall text-to-SQL performance, nearly matching that of substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data (i.e., they are bad at tabular data understanding), thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.

Anthology ID:: 2025.trustnlp-main.1
Volume:: Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Trista Cao, Anubrata Das, Tharindu Kumarage, Yixin Wan, Satyapriya Krishna, Ninareh Mehrabi, Jwala Dhamala, Anil Ramakrishna, Aram Galystan, Anoop Kumar, Rahul Gupta, Kai-Wei Chang
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–12
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.trustnlp-main.1/
DOI:
Bibkey:
Cite (ACL):: Ryan Pavlich, Nima Ebadi, Richard Tarbell, Billy Linares, Adrian Tan, Rachael Humphreys, Jayanta Das, Rambod Ghandiparsi, Hannah Haley, Jerris George, Rocky Slavin, Kim-Kwang Raymond Choo, Glenn Dietrich, and Anthony Rios. 2025. Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats. In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 1–12, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats (Pavlich et al., TrustNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.trustnlp-main.1.pdf

PDF Cite Search Fix data