Abstract
Automated text anonymization is a classical problem in Natural Language Processing (NLP). The topic has evolved immensely throughout the years, with the first list-search and rule-based solutions evolving to statistical modeling approaches and later to advanced systems that rely on powerful state-of-the-art language models. Even so, these solutions fail to be widely implemented in the most privacy-demanding areas of activity, such as healthcare; none of them is perfect, and most can not guarantee rigorous anonymization. This paper presents INCOGNITUS, a flexible platform for the automated anonymization of clinical notes that offers the possibility of applying different techniques. The available tools include an underexplored yet promising method that guarantees 100% recall by replacing each word with a semantically identical one. In addition, the presented framework incorporates a performance evaluation module to compute a novel metric for information loss assessment in real-time.- Anthology ID:
- 2023.eacl-demo.22
- Volume:
- Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 187–194
- Language:
- URL:
- https://aclanthology.org/2023.eacl-demo.22
- DOI:
- Cite (ACL):
- Bruno Ribeiro, Vitor Rolla, and Ricardo Santos. 2023. INCOGNITUS: A Toolbox for Automated Clinical Notes Anonymization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 187–194, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- INCOGNITUS: A Toolbox for Automated Clinical Notes Anonymization (Ribeiro et al., EACL 2023)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2023.eacl-demo.22.pdf