Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA

Nina Poerner; Ulli Waltinger; Hinrich Schütze

doi:10.18653/v1/2020.findings-emnlp.134

Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA

Nina Poerner, Ulli Waltinger, Hinrich Schütze

Abstract

Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by unsupervised pretraining on target-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO 2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on target-domain text and align the resulting word vectors with the wordpiece vectors of a general-domain PTLM. We evaluate on eight English biomedical Named Entity Recognition (NER) tasks and compare against the recently proposed BioBERT model. We cover over 60% of the BioBERT - BERT F1 delta, at 5% of BioBERT’s CO 2 footprint and 2% of its cloud compute cost. We also show how to quickly adapt an existing general-domain Question Answering (QA) model to an emerging domain: the Covid-19 pandemic.

Anthology ID:: 2020.findings-emnlp.134
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1482–1490
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.134/
DOI:: 10.18653/v1/2020.findings-emnlp.134
Bibkey:
Cite (ACL):: Nina Poerner, Ulli Waltinger, and Hinrich Schütze. 2020. Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1482–1490, Online. Association for Computational Linguistics.
Cite (Informal):: Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA (Poerner et al., Findings 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.134.pdf
Optionalsupplementarymaterial:: 2020.findings-emnlp.134.OptionalSupplementaryMaterial.pdf
Video:: https://slideslive.com/38940121
Code: npoe/covid-qa
Data: BC5CDR

PDF Cite Search Code Optionalsupplementarymaterial Video Fix data