Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition

Yun He, Ziwei Zhu, Yin Zhang, Qin Chen, James Caverlee


Abstract
Knowledge of a disease includes information of various aspects of the disease, such as signs and symptoms, diagnosis and treatment. This disease knowledge is critical for many health-related and biomedical tasks, including consumer health question answering, medical language inference and disease name recognition. While pre-trained language models like BERT have shown success in capturing syntactic, semantic, and world knowledge from text, we find they can be further complemented by specific information like knowledge of symptoms, diagnoses, treatments, and other disease aspects. Hence, we integrate BERT with disease knowledge for improving these important tasks. Specifically, we propose a new disease knowledge infusion training procedure and evaluate it on a suite of BERT models including BERT, BioBERT, SciBERT, ClinicalBERT, BlueBERT, and ALBERT. Experiments over the three tasks show that these models can be enhanced in nearly all cases, demonstrating the viability of disease knowledge infusion. For example, accuracy of BioBERT on consumer health question answering is improved from 68.29% to 72.09%, while new SOTA results are observed in two datasets. We make our data and code freely available.
Anthology ID:
2020.emnlp-main.372
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4604–4614
Language:
URL:
https://aclanthology.org/2020.emnlp-main.372
DOI:
10.18653/v1/2020.emnlp-main.372
Bibkey:
Cite (ACL):
Yun He, Ziwei Zhu, Yin Zhang, Qin Chen, and James Caverlee. 2020. Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4604–4614, Online. Association for Computational Linguistics.
Cite (Informal):
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition (He et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2020.emnlp-main.372.pdf
Video:
 https://slideslive.com/38939241
Code
 heyunh2015/diseaseBERT
Data
BLUE