Alistair Johnson
2026
Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification
Justin Xu | Alistair Johnson | Thomas Lin | David Eyre | Rodolfo Quispe
BioNLP 2026
Justin Xu | Alistair Johnson | Thomas Lin | David Eyre | Rodolfo Quispe
BioNLP 2026
De-identification systems prioritize recall to protect privacy, but excessive over-tagging reduces data utility. We propose an agentic refiner that reviews high-recall annotations using lightweight tools (validation functions, adaptive context retrieval, persistent to-do state, and modular review skills) to improve precision while minimizing recall loss. Experiments across three multilingual datasets show that the agent achieves significant improvements to binary precision. To support fine-grained analysis, we further introduce a synthetic error dataset of common and systemic failure modes, on which the agent corrects 99% of injected errors in the medical datasets. Our results suggest that agent-based refinement provides a flexible and effective mechanism for improving de-identification precision as a modular extension to existing high-recall systems.