SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Dai Quoc Nguyen; Cong Duy Vu Hoang; Duy Quang Vu; Gioacchino Tangari; Thanh Vu; Don Dharmasiri; Yuan-Fang Li; Long Duong

SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Quang Vu, Gioacchino Tangari, Thanh Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong

Abstract

Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios for the NL2SQL task. SQLong generates augmented datasets by extending existing database schemas with additional synthetic CREATE TABLE commands and corresponding data rows, sampled from diverse schemas in the training data. This approach effectively simulates long-context scenarios during finetuning and evaluation. Through experiments on the Spider and BIRD datasets, we demonstrate that LLMs finetuned with SQLong-augmented data significantly outperform those trained on standard datasets. These imply SQLong’s practical implementation and its impact on improving NL2SQL capabilities in real-world settings with complex database schemas.

Anthology ID:: 2025.trl-workshop.5
Volume:: Proceedings of the 4th Table Representation Learning Workshop
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Shuaichen Chang, Madelon Hulsebos, Qian Liu, Wenhu Chen, Huan Sun
Venues:: TRL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–55
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.trl-workshop.5/
DOI:
Bibkey:
Cite (ACL):: Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Quang Vu, Gioacchino Tangari, Thanh Vu, Don Dharmasiri, Yuan-Fang Li, and Long Duong. 2025. SQLong: Enhanced NL2SQL for Longer Contexts with LLMs. In Proceedings of the 4th Table Representation Learning Workshop, pages 47–55, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: SQLong: Enhanced NL2SQL for Longer Contexts with LLMs (Nguyen et al., TRL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.trl-workshop.5.pdf

PDF Cite Search Fix data