Assessment of DistilBERT performance on Named Entity Recognition task for the detection of Protected Health Information and medical concepts

Macarious Abadeer


Abstract
Bidirectional Encoder Representations from Transformers (BERT) models achieve state-of-the-art performance on a number of Natural Language Processing tasks. However, their model size on disk often exceeds 1 GB and the process of fine-tuning them and using them to run inference consumes significant hardware resources and runtime. This makes them hard to deploy to production environments. This paper fine-tunes DistilBERT, a lightweight deep learning model, on medical text for the named entity recognition task of Protected Health Information (PHI) and medical concepts. This work provides a full assessment of the performance of DistilBERT in comparison with BERT models that were pre-trained on medical text. For Named Entity Recognition task of PHI, DistilBERT achieved almost the same results as medical versions of BERT in terms of F1 score at almost half the runtime and consuming approximately half the disk space. On the other hand, for the detection of medical concepts, DistilBERT’s F1 score was lower by 4 points on average than medical BERT variants.
Anthology ID:
2020.clinicalnlp-1.18
Volume:
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Month:
November
Year:
2020
Address:
Online
Editors:
Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
158–167
Language:
URL:
https://aclanthology.org/2020.clinicalnlp-1.18
DOI:
10.18653/v1/2020.clinicalnlp-1.18
Bibkey:
Cite (ACL):
Macarious Abadeer. 2020. Assessment of DistilBERT performance on Named Entity Recognition task for the detection of Protected Health Information and medical concepts. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 158–167, Online. Association for Computational Linguistics.
Cite (Informal):
Assessment of DistilBERT performance on Named Entity Recognition task for the detection of Protected Health Information and medical concepts (Abadeer, ClinicalNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.clinicalnlp-1.18.pdf
Video:
 https://slideslive.com/38939824
Data
BLUE