To Describe or Not to Describe? Benchmarking Database Representations for Schema Linking in Text-to-SQL

Daiane Ucceli Kreitlow, Hilário Tomaz Alves de Oliveira


Abstract
Text-to-SQL systems aim to translate natural language questions into Structured Query Language (SQL) queries, enabling database access without requiring SQL expertise. In real-world scenarios, these systems often need to manage multiple databases with heterogeneous schemas, making Schema Linking a crucial preliminary step for identifying relevant databases, tables, and columns. This study investigates Schema Linking for questions written in Brazilian Portuguese and compares two schema representation strategies: natural-language descriptions generated by Large Language Models (LLMs) and representations based on Data Definition Language (DDL) and Data Manipulation Language (DML) commands. Experiments conducted on a Brazilian Portuguese version of the Spider dataset, with over 200 databases, evaluated several LLMs and embedding models. The experimental results based on Hit@k show that natural language descriptions consistently outperform DDL/DML-based representations, demonstrating the effectiveness of LLM-generated schema descriptions for Schema Linking tasks.
Anthology ID:
2026.propor-1.15
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
151–160
Language:
URL:
https://preview.aclanthology.org/ingest-dnd/2026.propor-1.15/
DOI:
Bibkey:
Cite (ACL):
Daiane Ucceli Kreitlow and Hilário Tomaz Alves de Oliveira. 2026. To Describe or Not to Describe? Benchmarking Database Representations for Schema Linking in Text-to-SQL. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 151–160, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
To Describe or Not to Describe? Benchmarking Database Representations for Schema Linking in Text-to-SQL (Kreitlow & Oliveira, PROPOR 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-dnd/2026.propor-1.15.pdf