EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

Jaehee Ryu, Seonhee Cho, Gyubok Lee, Edward Choi


Abstract
In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. Our experiments demonstrate the superiority of a multi-turn approach over a single-turn approach in learning compositionality. Additionally, our dataset integrates specially crafted tokens into SQL queries to improve execution efficiency. With EHR-SeqSQL, we aim to bridge the gap between practical needs and academic research in the text-to-SQL domain.
Anthology ID:
2024.findings-acl.971
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16388–16407
Language:
URL:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-acl.971/
DOI:
10.18653/v1/2024.findings-acl.971
Bibkey:
Cite (ACL):
Jaehee Ryu, Seonhee Cho, Gyubok Lee, and Edward Choi. 2024. EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records. In Findings of the Association for Computational Linguistics: ACL 2024, pages 16388–16407, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records (Ryu et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-acl.971.pdf