Abstract
This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.- Anthology ID:
- L16-1598
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3772–3778
- Language:
- URL:
- https://aclanthology.org/L16-1598
- DOI:
- Cite (ACL):
- Kirk Roberts and Dina Demner-Fushman. 2016. Annotating Logical Forms for EHR Questions. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3772–3778, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Annotating Logical Forms for EHR Questions (Roberts & Demner-Fushman, LREC 2016)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/L16-1598.pdf