Abstract
This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are named entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.- Anthology ID:
- 2021.nodalida-main.54
- Volume:
- Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May 31--2 June
- Year:
- 2021
- Address:
- Reykjavik, Iceland (Online)
- Editors:
- Simon Dobnik, Lilja Øvrelid
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- Linköping University Electronic Press, Sweden
- Note:
- Pages:
- 467–471
- Language:
- URL:
- https://aclanthology.org/2021.nodalida-main.54
- DOI:
- Cite (ACL):
- Hanna Berg and Hercules Dalianis. 2021. HB Deid - HB De-identification tool demonstrator. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 467–471, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
- Cite (Informal):
- HB Deid - HB De-identification tool demonstrator (Berg & Dalianis, NoDaLiDa 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.nodalida-main.54.pdf