UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification

Poojah Ganesan, Rajat Aayush Jha, Dan Roth, Vivek Gupta


Abstract
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issues, we introduce UNJOIN, a two-stage framework that decouples the retrieval of schema elements from SQL logic generation. In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name. This allows the model to focus purely on accurate retrieval without being distracted by the need to write complex SQL logic. In the second stage, the SQL query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic. Evaluations on SPIDER and BIRD datasets show that UNJOIN matches or exceeds the state-of-the-art baselines. UNJOIN uses only schema information, which does not require data access or fine-tuning, making it scalable and adaptable across databases. Our code is available at: https://github.com/coral-lab-asu/unjoin
Anthology ID:
2026.surgellm-1.1
Volume:
Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Vivek Gupta, Kaize Ding, Harsha Kokel, Yue Zhao, Amit Agarwal, Yu Wang, Michael Glass, Yu Zhang, Kavitha Srinivas, Xiusi Chen, Oktie Hassanzadeh, Qi Zhu, Shuaichen Chang, Yuan Luo
Venues:
SURGeLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–15
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.surgellm-1.1/
DOI:
Bibkey:
Cite (ACL):
Poojah Ganesan, Rajat Aayush Jha, Dan Roth, and Vivek Gupta. 2026. UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification. In Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026), pages 1–15, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification (Ganesan et al., SURGeLLM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.surgellm-1.1.pdf