emrKBQA: A Clinical Knowledge-Base Question Answering Dataset
Preethi Raghavan, Jennifer J Liang, Diwakar Mahajan, Rachita Chandra, Peter Szolovits
Abstract
We present emrKBQA, a dataset for answering physician questions from a structured patient record. It consists of questions, logical forms and answers. The questions and logical forms are generated based on real-world physician questions and are slot-filled and answered from patients in the MIMIC-III KB through a semi-automated process. This community-shared release consists of over 940000 question, logical form and answer triplets with 389 types of questions and ~7.5 paraphrases per question type. We perform experiments to validate the quality of the dataset and set benchmarks for question to logical form learning that helps answer questions on this dataset.- Anthology ID:
- 2021.bionlp-1.7
- Volume:
- Proceedings of the 20th Workshop on Biomedical Language Processing
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 64–73
- Language:
- URL:
- https://aclanthology.org/2021.bionlp-1.7
- DOI:
- 10.18653/v1/2021.bionlp-1.7
- Cite (ACL):
- Preethi Raghavan, Jennifer J Liang, Diwakar Mahajan, Rachita Chandra, and Peter Szolovits. 2021. emrKBQA: A Clinical Knowledge-Base Question Answering Dataset. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 64–73, Online. Association for Computational Linguistics.
- Cite (Informal):
- emrKBQA: A Clinical Knowledge-Base Question Answering Dataset (Raghavan et al., BioNLP 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.bionlp-1.7.pdf
- Code
- emrqa/emrkbqa
- Data
- emrQA