Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation

Gao yu Zhu; Wei Shao; Xichou Zhu; Lei Yu; Jiafeng Guo (嘉丰 郭); Xueqi Cheng

Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation

Gao yu Zhu, Wei Shao, Xichou Zhu, Lei Yu, Jiafeng Guo, Xueqi Cheng

Abstract

Text2Sql is a task that converts natural language questions into SQL queries. In previous research on LLM fine-tuning, researchers typically input both the entire database schema and the natural language question into the model. This approach has two issues: 1) the model’s context is limited when dealing with a large number of database tables; 2) the question is often related to only a few tables, leading to excessive irrelevant information that distracts the model. To address these issues, we employed pure fine-tuning strategy to reduce redundancy. The model fine-tuned with pure prompts, using prompts that are only 53% of the baseline length, outperforms the baseline (fine-tuned with all tables in the prompt) by 8.2% and 8.6% in Test-suite accuracy (TS) and exact-set-match accuracy (EM), respectively, on the Spider dev set. Under the most refined Spider dev set of prompts, the model achieves TS and EM scores of 73.5% and 75.4%, respectively, approaching state-of-the-art (SOTA) levels. To leverage the capabilities of the model with pure prompts, we applied pure knowledge distillation strategy to transfer its abilities. The distilled student model achieved a 1.9% improvement in TS, while the teacher model’s prompt length was only 23% of that of the student model.

Anthology ID:: 2025.naacl-industry.5
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 54–61
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-industry.5/
DOI:
Bibkey:
Cite (ACL):: Gao yu Zhu, Wei Shao, Xichou Zhu, Lei Yu, Jiafeng Guo, and Xueqi Cheng. 2025. Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 54–61, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation (Zhu et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-industry.5.pdf

PDF Cite Search Fix data