Project PRIMUS at EHRSQL 2024 : Text-to-SQL Generation using Large Language Model for EHR Analysis
Sourav Joy, Rohan Ahmed, Argha Saha, Minhaj Habil, Utsho Das, Partha Bhowmik
Abstract
This paper explores the application of the sqlcoders model, a pre-trained neural network, for automatic SQL query generation from natural language questions. We focus on the model’s internal functionality and demonstrate its effectiveness on a domain-specific validation dataset provided by EHRSQL. The sqlcoders model, based on transformers with attention mechanisms, has been trained on paired examples of natural language questions and corresponding SQL queries. It takes advantage of a carefully crafted prompt that incorporates the database schema alongside the question to guide the model towards the desired output format.- Anthology ID:
- 2024.clinicalnlp-1.41
- Volume:
- Proceedings of the 6th Clinical Natural Language Processing Workshop
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
- Venues:
- ClinicalNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 422–427
- Language:
- URL:
- https://aclanthology.org/2024.clinicalnlp-1.41
- DOI:
- Cite (ACL):
- Sourav Joy, Rohan Ahmed, Argha Saha, Minhaj Habil, Utsho Das, and Partha Bhowmik. 2024. Project PRIMUS at EHRSQL 2024 : Text-to-SQL Generation using Large Language Model for EHR Analysis. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 422–427, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Project PRIMUS at EHRSQL 2024 : Text-to-SQL Generation using Large Language Model for EHR Analysis (Joy et al., ClinicalNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2024.clinicalnlp-1.41.pdf