Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification
Justin Xu, Alistair Johnson, Thomas Lin, David Eyre, Rodolfo Quispe
Abstract
De-identification systems prioritize recall to protect privacy, but excessive over-tagging reduces data utility. We propose an agentic refiner that reviews high-recall annotations using lightweight tools (validation functions, adaptive context retrieval, persistent to-do state, and modular review skills) to improve precision while minimizing recall loss. Experiments across three multilingual datasets show that the agent achieves significant improvements to binary precision. To support fine-grained analysis, we further introduce a synthetic error dataset of common and systemic failure modes, on which the agent corrects 99% of injected errors in the medical datasets. Our results suggest that agent-based refinement provides a flexible and effective mechanism for improving de-identification precision as a modular extension to existing high-recall systems.- Anthology ID:
- 2026.bionlp-1.11
- Volume:
- BioNLP 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 115–127
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.11/
- DOI:
- Cite (ACL):
- Justin Xu, Alistair Johnson, Thomas Lin, David Eyre, and Rodolfo Quispe. 2026. Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification. In BioNLP 2026, pages 115–127, San Diego, California. Association for Computational Linguistics.
- Cite (Informal):
- Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification (Xu et al., BioNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.11.pdf