EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain
Chen Lin, Timothy Miller, Dmitriy Dligach, Steven Bethard, Guergana Savova
Abstract
Transformer-based neural language models have led to breakthroughs for a variety of natural language processing (NLP) tasks. However, most models are pretrained on general domain data. We propose a methodology to produce a model focused on the clinical domain: continued pretraining of a model with a broad representation of biomedical terminology (PubMedBERT) on a clinical corpus along with a novel entity-centric masking strategy to infuse domain knowledge in the learning process. We show that such a model achieves superior results on clinical extraction tasks by comparing our entity-centric masking strategy with classic random masking on three clinical NLP tasks: cross-domain negation detection, document time relation (DocTimeRel) classification, and temporal relation extraction. We also evaluate our models on the PubMedQA dataset to measure the models’ performance on a non-entity-centric task in the biomedical domain. The language addressed in this work is English.- Anthology ID:
- 2021.bionlp-1.21
- Volume:
- Proceedings of the 20th Workshop on Biomedical Language Processing
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 191–201
- Language:
- URL:
- https://aclanthology.org/2021.bionlp-1.21
- DOI:
- 10.18653/v1/2021.bionlp-1.21
- Cite (ACL):
- Chen Lin, Timothy Miller, Dmitriy Dligach, Steven Bethard, and Guergana Savova. 2021. EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 191–201, Online. Association for Computational Linguistics.
- Cite (Informal):
- EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain (Lin et al., BioNLP 2021)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/2021.bionlp-1.21.pdf
- Data
- PubMedQA