David Eyre
2026
Discharge Instructions are not One Task: Grounding Differences Between Surgical and Non-Surgical Admissions
Mayank Jobanputra | Justin Xu | Samarth Oza | Hulma Naseer | Yifan Wang | Blerta Veseli | Chandralekha Kona | Haochen Cui | David Eyre | Vera Demberg
BioNLP 2026
Mayank Jobanputra | Justin Xu | Samarth Oza | Hulma Naseer | Yifan Wang | Blerta Veseli | Chandralekha Kona | Haochen Cui | David Eyre | Vera Demberg
BioNLP 2026
Discharge instructions are patient-facing, safety-critical documents that guide medication use, follow-up care, and recovery after hospitalization. Because they must synthesize information across the clinical record and often include post-discharge guidance not stated verbatim in the EHR, they are a difficult target for clinical text generation. In this work, we study discharge instructions in MIMIC-IV through a grounding-first lens. Using two LLMs, we decompose each discharge instruction into medically relevant statements and verify them against the Electronic Health Record (EHR). We find that discharge instructions for Surgical admissions are much longer, averaging roughly 24–25 statements per admission versus 11–12 in Non-Surgical cases, while supported content remains similar in absolute count. The additional Surgical content is dominated by statements that are not directly stated in the record or require clinically plausible extrapolation. Through this analysis, we advocate for better grounding and completeness evaluations at a fine-grained level, establishing a foundational step toward safer and more reliable discharge-instruction generation.
Post Hoc Agentic Refinement for Improving Precision in Multilingual Clinical Text De-identification
Justin Xu | Alistair Johnson | Thomas Lin | David Eyre | Rodolfo Quispe
BioNLP 2026
Justin Xu | Alistair Johnson | Thomas Lin | David Eyre | Rodolfo Quispe
BioNLP 2026
De-identification systems prioritize recall to protect privacy, but excessive over-tagging reduces data utility. We propose an agentic refiner that reviews high-recall annotations using lightweight tools (validation functions, adaptive context retrieval, persistent to-do state, and modular review skills) to improve precision while minimizing recall loss. Experiments across three multilingual datasets show that the agent achieves significant improvements to binary precision. To support fine-grained analysis, we further introduce a synthetic error dataset of common and systemic failure modes, on which the agent corrects 99% of injected errors in the medical datasets. Our results suggest that agent-based refinement provides a flexible and effective mechanism for improving de-identification precision as a modular extension to existing high-recall systems.