Abstract
Building tools to remove sensitive information such as personal names, addresses, and telephone numbers - so called Protected Health Information (PHI) - from clinical free text is an important task to make clinical texts available for research. These de-identification tools must be assessed regarding their quality in the form of the measurements precision and re- call. To assess such tools, gold standards - annotated clinical text - must be available. Such gold standards exist for larger languages. For Norwegian, how- ever, there are no such resources. Therefore, an already existing Norwegian synthetic clinical corpus, NorSynthClinical, has been extended with PHIs and annotated by two annotators, obtaining an inter-annotator agreement of 0.94 F1-measure. In total, the corpus has 409 annotated PHI instances and is called NorSynthClinical PHI. A de-identification hybrid tool (machine learning and rule-based meth- ods) for Norwegian was developed and trained with open available resources, and obtained an overall F1-measure of 0.73 and a recall of 0.62, when tested using NorSynthClinical PHI. NorSynthClinical PHI is made open and available at Github to be used by the research community.- Anthology ID:
- 2021.nodalida-main.22
- Volume:
- Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May 31--2 June
- Year:
- 2021
- Address:
- Reykjavik, Iceland (Online)
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- Linköping University Electronic Press, Sweden
- Note:
- Pages:
- 222–230
- Language:
- URL:
- https://aclanthology.org/2021.nodalida-main.22
- DOI:
- Cite (ACL):
- Synnøve Bråthen, Wilhelm Wie, and Hercules Dalianis. 2021. Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 222–230, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
- Cite (Informal):
- Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification (Bråthen et al., NoDaLiDa 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.nodalida-main.22.pdf