Detecting Personal Identifiable Information in Swedish Learner Essays
Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Therese Lindström Tiedemann, Elena Volodina
Abstract
Linguistic data can — and often does — contain PII (Personal Identifiable Information). Both from a legal and ethical standpoint, the sharing of such data is not permissible. According to the GDPR, pseudonymization, i.e. the replacement of sensitive information with surrogates, is an acceptable strategy for privacy preservation. While research has been conducted on the detection and replacement of sensitive data in Swedish medical data using Large Language Models (LLMs), it is unclear whether these models handle PII in less structured and more thematically varied texts equally well. In this paper, we present and discuss the performance of an LLM-based PII-detection system for Swedish learner essays.- Anthology ID:
- 2024.caldpseudo-1.7
- Volume:
- Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Elena Volodina, David Alfter, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, Xuan-Son Vu
- Venues:
- CALD-pseudo | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 54–63
- Language:
- URL:
- https://aclanthology.org/2024.caldpseudo-1.7
- DOI:
- Cite (ACL):
- Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Therese Lindström Tiedemann, and Elena Volodina. 2024. Detecting Personal Identifiable Information in Swedish Learner Essays. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), pages 54–63, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Personal Identifiable Information in Swedish Learner Essays (Szawerna et al., CALD-pseudo-WS 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.caldpseudo-1.7.pdf