Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification

Justin Xu, Alistair Johnson, Thomas Lin, David Eyre, Rodolfo Quispe


Abstract
De-identification systems prioritize recall to protect privacy, but excessive over-tagging reduces data utility. We propose an agentic refiner that reviews high-recall annotations using lightweight tools (validation functions, adaptive context retrieval, persistent to-do state, and modular review skills) to improve precision while minimizing recall loss. Experiments across three multilingual datasets show that the agent achieves significant improvements to binary precision. To support fine-grained analysis, we further introduce a synthetic error dataset of common and systemic failure modes, on which the agent corrects 99% of injected errors in the medical datasets. Our results suggest that agent-based refinement provides a flexible and effective mechanism for improving de-identification precision as a modular extension to existing high-recall systems.
Anthology ID:
2026.bionlp-1.11
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–127
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.11/
DOI:
Bibkey:
Cite (ACL):
Justin Xu, Alistair Johnson, Thomas Lin, David Eyre, and Rodolfo Quispe. 2026. Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification. In BioNLP 2026, pages 115–127, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification (Xu et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.11.pdf