AIRI NLP Team at EHRSQL 2024 Shared Task: T5 and Logistic Regression to the Rescue

Oleg Somov, Alexey Dontsov, Elena Tutubalina


Abstract
This paper presents a system developed for the Clinical NLP 2024 Shared Task, focusing on reliable text-to-SQL modeling on Electronic Health Records (EHRs). The goal is to create a model that accurately generates SQL queries for answerable questions while avoiding incorrect responses and handling unanswerable queries. Our approach comprises three main components: a query correspondence model, a text-to-SQL model, and an SQL verifier.For the query correspondence model, we trained a logistic regression model using hand-crafted features to distinguish between answerable and unanswerable queries. As for the text-to-SQL model, we utilized T5-3B as a pretrained language model, further fine-tuned on pairs of natural language questions and corresponding SQL queries. Finally, we applied the SQL verifier to inspect the resulting SQL queries.During the evaluation stage of the shared task, our system achieved an accuracy of 68.9 % (metric version without penalty), positioning it at the fifth-place ranking. While our approach did not surpass solutions based on large language models (LMMs) like ChatGPT, it demonstrates the promising potential of domain-specific specialized models that are more resource-efficient. The code is publicly available at https://github.com/runnerup96/EHRSQL-text2sql-solution.
Anthology ID:
2024.clinicalnlp-1.43
Volume:
Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:
ClinicalNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
431–438
Language:
URL:
https://aclanthology.org/2024.clinicalnlp-1.43
DOI:
Bibkey:
Cite (ACL):
Oleg Somov, Alexey Dontsov, and Elena Tutubalina. 2024. AIRI NLP Team at EHRSQL 2024 Shared Task: T5 and Logistic Regression to the Rescue. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 431–438, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
AIRI NLP Team at EHRSQL 2024 Shared Task: T5 and Logistic Regression to the Rescue (Somov et al., ClinicalNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.clinicalnlp-1.43.pdf