Abstract
This paper proposes a dataset and method for automatically generating paraphrases for clinical questions relating to patient-specific information in electronic health records (EHRs). Crowdsourcing is used to collect 10,578 unique questions across 946 semantically distinct paraphrase clusters. This corpus is then used with a deep learning-based question paraphrasing method utilizing variational autoencoder and LSTM encoder/decoder. The ultimate use of such a method is to improve the performance of automatic question answering methods for EHRs.- Anthology ID:
- W19-5003
- Volume:
- Proceedings of the 18th BioNLP Workshop and Shared Task
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 20–29
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/W19-5003/
- DOI:
- 10.18653/v1/W19-5003
- Cite (ACL):
- Sarvesh Soni and Kirk Roberts. 2019. A Paraphrase Generation System for EHR Question Answering. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 20–29, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- A Paraphrase Generation System for EHR Question Answering (Soni & Roberts, BioNLP 2019)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/W19-5003.pdf