A Multilevel Analysis of PubMed-only BERT-based Biomedical Models

Vicente Sanchez Carmona, Shanshan Jiang, Bin Dong


Abstract
Biomedical NLP models play a big role in the automatic extraction of information from biomedical documents, such as COVID research papers. Three landmark models have led the way in this area: BioBERT, MSR BiomedBERT, and BioLinkBERT. However, their shallow evaluation –a single mean score– forbid us to better understand how the contributions proposed in each model advance the Biomedical NLP field. We show through a Multilevel Analysis how we can assess these contributions. Our analyses across 5000 fine-tuned models show that, actually, BiomedBERT’s true effect is bigger than BioLinkBERT’s effect, and the success of BioLinkBERT does not seem to be due to its contribution –the Link function– but due to an unknown factor.
Anthology ID:
2024.clinicalnlp-1.10
Volume:
Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:
ClinicalNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–110
Language:
URL:
https://aclanthology.org/2024.clinicalnlp-1.10
DOI:
Bibkey:
Cite (ACL):
Vicente Sanchez Carmona, Shanshan Jiang, and Bin Dong. 2024. A Multilevel Analysis of PubMed-only BERT-based Biomedical Models. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 105–110, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
A Multilevel Analysis of PubMed-only BERT-based Biomedical Models (Sanchez Carmona et al., ClinicalNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.clinicalnlp-1.10.pdf