Building A German Clinical Named Entity Recognition System without In-domain Training Data

Siting Liang, Daniel Sonntag


Abstract
Clinical Named Entity Recognition (NER) is essential for extracting important medical insights from clinical narratives. Given the challenges in obtaining expert training datasets for real-world clinical applications related to data protection regulations and the lack of standardised entity types, this work represents a collaborative initiative aimed at building a German clinical NER system with a focus on addressing these obstacles effectively. In response to the challenge of training data scarcity, we propose a Conditional Relevance Learning (CRL) approach in low-resource transfer learning scenarios. CRL effectively leverages a pre-trained language model and domain-specific open resources, enabling the acquisition of a robust base model tailored for clinical NER tasks, particularly in the face of changing label sets. This flexibility empowers the implementation of a Multilayered Semantic Annotation (MSA) schema in our NER system, capable of organizing a diverse array of entity types, thus significantly boosting the NER system’s adaptability and utility across various clinical domains. In the case study, we demonstrate how our NER system can be applied to overcome resource constraints and comply with data privacy regulations. Lacking prior training on in-domain data, feedback from expert users in respective domains is essential in identifying areas for system refinement. Future work will focus on the integration of expert feedback to improve system performance in specific clinical contexts.
Anthology ID:
2024.clinicalnlp-1.7
Volume:
Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:
ClinicalNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
70–81
Language:
URL:
https://aclanthology.org/2024.clinicalnlp-1.7
DOI:
Bibkey:
Cite (ACL):
Siting Liang and Daniel Sonntag. 2024. Building A German Clinical Named Entity Recognition System without In-domain Training Data. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 70–81, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Building A German Clinical Named Entity Recognition System without In-domain Training Data (Liang & Sonntag, ClinicalNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.clinicalnlp-1.7.pdf