Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting
Aseem Arora, Shabbirhussain Bhaisaheb, Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff
Abstract
Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.- Anthology ID:
- 2023.genbench-1.3
- Volume:
- Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Koustuv Sinha, Amirhossein Kazemnejad, Christos Christodoulopoulos, Ryan Cotterell, Elia Bruni
- Venues:
- GenBench | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 25–47
- Language:
- URL:
- https://aclanthology.org/2023.genbench-1.3
- DOI:
- 10.18653/v1/2023.genbench-1.3
- Cite (ACL):
- Aseem Arora, Shabbirhussain Bhaisaheb, Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, and Gautam Shroff. 2023. Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 25–47, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting (Arora et al., GenBench-WS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2023.genbench-1.3.pdf