Shaobin Shi


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
GenLink: Generation-Driven Schema-Linking via Multi-Model Learning for Text-to-SQL
Zhifeng Hao | Junqi Huang | Shaobin Shi | Ruichu Cai | Boyan Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Schema linking is widely recognized as a key factor in improving text-to-SQL performance. Supervised fine-tuning approaches enhance SQL generation quality by explicitly fine-tuning schema linking as an extraction task. However, they suffer from two major limitations: (i) The training corpus of small language models restricts their cross-domain generalization ability. (ii) The extraction-based fine-tuning process struggles to capture complex linking patterns. To address these issues, we propose GenLink, a generation-driven schema-linking framework based on multi-model learning. Instead of explicitly extracting schema elements, GenLink enhances linking through a generation-based learning process, effectively capturing implicit schema relationships. By integrating multiple small language models, GenLink improves schema-linking recall rate and ensures robust cross-domain adaptability. Experimental results on the BIRD and Spider benchmarks validate the effectiveness of GenLink, achieving execution accuracies of 67.34% (BIRD), 89.7% (Spider development set), and 87.8% (Spider test set), demonstrating its superiority in handling diverse and complex database schemas.

pdf bib
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
Bingfeng Chen | Shaobin Shi | Yongqi Luo | Boyan Xu | Ruichu Cai | Zhifeng Hao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Generative language models have shown significant potential in single-turn Text-to-SQL. However, their performance does not extend equivalently to multi-turn Text-to-SQL. This is primarily due to generative language models’ inadequacy in handling the complexities of context information and dynamic schema linking in multi-turn interactions. In this paper, we propose a framework named Track-SQL, which enhances generative language models with dual-extractive modules designed to track schema and contextual changes in multi-turn Text-to-SQL. Specifically, Track-SQL incorporates a Semantic-enhanced Schema Extractor and a Schema-aware Context Extractor. Experimental results demonstrate that Track-SQL achieves state-of-the-art performance on the SparC and CoSQL datasets. Furthermore, detailed ablation studies reveal that Track-SQL significantly improves execution accuracy in multi-turn interactions by 7.1% and 9.55% on these datasets, respectively. Our implementation will be open-sourced at https://github.com/DMIRLAB-Group/Track-SQL.