Are we there yet? Exploring clinical domain knowledge of BERT models

Madhumita Sushil, Simon Suster, Walter Daelemans


Abstract
We explore whether state-of-the-art BERT models encode sufficient domain knowledge to correctly perform domain-specific inference. Although BERT implementations such as BioBERT are better at domain-based reasoning than those trained on general-domain corpora, there is still a wide margin compared to human performance on these tasks. To bridge this gap, we explore whether supplementing textual domain knowledge in the medical NLI task: a) by further language model pretraining on the medical domain corpora, b) by means of lexical match algorithms such as the BM25 algorithm, c) by supplementing lexical retrieval with dependency relations, or d) by using a trained retriever module, can push this performance closer to that of humans. We do not find any significant difference between knowledge supplemented classification as opposed to the baseline BERT models, however. This is contrary to the results for evidence retrieval on other tasks such as open domain question answering (QA). By examining the retrieval output, we show that the methods fail due to unreliable knowledge retrieval for complex domain-specific reasoning. We conclude that the task of unsupervised text retrieval to bridge the gap in existing information to facilitate inference is more complex than what the state-of-the-art methods can solve, and warrants extensive research in the future.
Anthology ID:
2021.bionlp-1.5
Volume:
Proceedings of the 20th Workshop on Biomedical Language Processing
Month:
June
Year:
2021
Address:
Online
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–53
Language:
URL:
https://aclanthology.org/2021.bionlp-1.5
DOI:
10.18653/v1/2021.bionlp-1.5
Bibkey:
Cite (ACL):
Madhumita Sushil, Simon Suster, and Walter Daelemans. 2021. Are we there yet? Exploring clinical domain knowledge of BERT models. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 41–53, Online. Association for Computational Linguistics.
Cite (Informal):
Are we there yet? Exploring clinical domain knowledge of BERT models (Sushil et al., BioNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2021.bionlp-1.5.pdf