MeDa-BERT: A medical Danish pretrained transformer model
Jannik Pedersen, Martin Laursen, Pernille Vinholt, Thiusius Rajeeth Savarimuthu
Abstract
This paper introduces a medical Danish BERT-based language model (MeDa-BERT) and medical Danish word embeddings. The word embeddings and MeDa-BERT were pretrained on a new medical Danish corpus consisting of 133M tokens from medical Danish books and text from the internet. The models showed improved performance over general-domain models on medical Danish classification tasks. The medical word embeddings and MeDa-BERT are publicly available.- Anthology ID:
- 2023.nodalida-1.31
- Volume:
- Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May
- Year:
- 2023
- Address:
- Tórshavn, Faroe Islands
- Editors:
- Tanel Alumäe, Mark Fishel
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- University of Tartu Library
- Note:
- Pages:
- 301–307
- Language:
- URL:
- https://aclanthology.org/2023.nodalida-1.31
- DOI:
- Cite (ACL):
- Jannik Pedersen, Martin Laursen, Pernille Vinholt, and Thiusius Rajeeth Savarimuthu. 2023. MeDa-BERT: A medical Danish pretrained transformer model. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 301–307, Tórshavn, Faroe Islands. University of Tartu Library.
- Cite (Informal):
- MeDa-BERT: A medical Danish pretrained transformer model (Pedersen et al., NoDaLiDa 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.nodalida-1.31.pdf