Abstract
We introduce the problem of automatic textual anonymisation and present a new publicly-available, pseudonymised benchmark corpus of personal email text for the task, dubbed ITAC (Informal Text Anonymisation Corpus). We discuss the method by which the corpus was constructed, and consider some important issues related to the evaluation of textual anonymisation systems. We also present some initial baseline results on the new corpus using a state of the art HMM-based tagger. We introduce the problem of automatic textual anonymisation and present a new publicly-available, pseudonymised benchmark corpus of personal email text for the task, dubbed ITAC (Informal Text Anonymisation Corpus). We discuss the method by which the corpus was constructed, and consider some important issues related to the evaluation of textual anonymisation systems. We also present some initial baseline results on the new corpus using a state of the art HMM-based tagger.- Anthology ID:
- L06-1110
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/200_pdf.pdf
- DOI:
- Cite (ACL):
- Ben Medlock. 2006. An Introduction to NLP-based Textual Anonymisation. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- An Introduction to NLP-based Textual Anonymisation (Medlock, LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/200_pdf.pdf