Matthew Zent


2025

pdf bib
PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues
Matthew Zent | Digory Smith | Simon Woodhead
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD_2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.