Bidirectional Semantic Enhancement for Schema Routing Across Large-Scale Databases

Yuyang Wu, Xiaoliang Wang, Cam-Tu Nguyen


Abstract
With the prevalence of Large Language Models (LLMs), Text-to-SQL has made significant progress, yet applying it to massive, real-world databases remains a challenge. While previous works adopt a retrieve-then-generate framework, they struggle with the profound semantic gap between user queries and vague schema definitions. Existing methods relying on unidirectional query expansion often fail to bridge lexical mismatches, while graph-based approaches struggle to navigate schemas when explicit structural links (e.g., foreign keys) are missing. To address this, we propose Bi-SR, a retrieval framework that bridges this gap through a bidirectional semantic enhancement strategy. We simultaneously enrich vague table schemas offline and perform online generative query expansion—specifically predicting potential schema structures—to align user intent. Crucially, we introduce a dual-augmented contrastive training objective for the dense retriever, which trains the dense retriever to recognize the semantic correspondence between the LLM-expanded query intent and the detailed schema descriptions. Experiments on massive schema routing benchmarks constructed from BIRD and Spider demonstrate that Bi-SR achieves state-of-the-art performance and significantly empowers smaller models for cost-effective deployment.
Anthology ID:
2026.findings-acl.369
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7495–7509
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.369/
DOI:
Bibkey:
Cite (ACL):
Yuyang Wu, Xiaoliang Wang, and Cam-Tu Nguyen. 2026. Bidirectional Semantic Enhancement for Schema Routing Across Large-Scale Databases. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7495–7509, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Bidirectional Semantic Enhancement for Schema Routing Across Large-Scale Databases (Wu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.369.pdf
Checklist:
 2026.findings-acl.369.checklist.pdf