Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation

Gao yu Zhu, Wei Shao, Xichou Zhu, Lei Yu, Jiafeng Guo, Xueqi Cheng


Abstract
Text2Sql is a task that converts natural language questions into SQL queries. In previous research on LLM fine-tuning, researchers typically input both the entire database schema and the natural language question into the model. This approach has two issues: 1) the model’s context is limited when dealing with a large number of database tables; 2) the question is often related to only a few tables, leading to excessive irrelevant information that distracts the model. To address these issues, we employed pure fine-tuning strategy to reduce redundancy. The model fine-tuned with pure prompts, using prompts that are only 53% of the baseline length, outperforms the baseline (fine-tuned with all tables in the prompt) by 8.2% and 8.6% in Test-suite accuracy (TS) and exact-set-match accuracy (EM), respectively, on the Spider dev set. Under the most refined Spider dev set of prompts, the model achieves TS and EM scores of 73.5% and 75.4%, respectively, approaching state-of-the-art (SOTA) levels. To leverage the capabilities of the model with pure prompts, we applied pure knowledge distillation strategy to transfer its abilities. The distilled student model achieved a 1.9% improvement in TS, while the teacher model’s prompt length was only 23% of that of the student model.
Anthology ID:
2025.naacl-industry.5
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–61
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-industry.5/
DOI:
Bibkey:
Cite (ACL):
Gao yu Zhu, Wei Shao, Xichou Zhu, Lei Yu, Jiafeng Guo, and Xueqi Cheng. 2025. Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 54–61, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation (Zhu et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-industry.5.pdf