Project PRIMUS at EHRSQL 2024 : Text-to-SQL Generation using Large Language Model for EHR Analysis

Sourav Joy, Rohan Ahmed, Argha Saha, Minhaj Habil, Utsho Das, Partha Bhowmik


Abstract
This paper explores the application of the sqlcoders model, a pre-trained neural network, for automatic SQL query generation from natural language questions. We focus on the model’s internal functionality and demonstrate its effectiveness on a domain-specific validation dataset provided by EHRSQL. The sqlcoders model, based on transformers with attention mechanisms, has been trained on paired examples of natural language questions and corresponding SQL queries. It takes advantage of a carefully crafted prompt that incorporates the database schema alongside the question to guide the model towards the desired output format.
Anthology ID:
2024.clinicalnlp-1.41
Volume:
Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:
ClinicalNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
422–427
Language:
URL:
https://aclanthology.org/2024.clinicalnlp-1.41
DOI:
Bibkey:
Cite (ACL):
Sourav Joy, Rohan Ahmed, Argha Saha, Minhaj Habil, Utsho Das, and Partha Bhowmik. 2024. Project PRIMUS at EHRSQL 2024 : Text-to-SQL Generation using Large Language Model for EHR Analysis. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 422–427, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Project PRIMUS at EHRSQL 2024 : Text-to-SQL Generation using Large Language Model for EHR Analysis (Joy et al., ClinicalNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.clinicalnlp-1.41.pdf