Samarth Oza
2026
Discharge Instructions are not One Task: Grounding Differences Between Surgical and Non-Surgical Admissions
Mayank Jobanputra | Justin Xu | Samarth Oza | Hulma Naseer | Yifan Wang | Blerta Veseli | Chandralekha Kona | Haochen Cui | David Eyre | Vera Demberg
BioNLP 2026
Mayank Jobanputra | Justin Xu | Samarth Oza | Hulma Naseer | Yifan Wang | Blerta Veseli | Chandralekha Kona | Haochen Cui | David Eyre | Vera Demberg
BioNLP 2026
Discharge instructions are patient-facing, safety-critical documents that guide medication use, follow-up care, and recovery after hospitalization. Because they must synthesize information across the clinical record and often include post-discharge guidance not stated verbatim in the EHR, they are a difficult target for clinical text generation. In this work, we study discharge instructions in MIMIC-IV through a grounding-first lens. Using two LLMs, we decompose each discharge instruction into medically relevant statements and verify them against the Electronic Health Record (EHR). We find that discharge instructions for Surgical admissions are much longer, averaging roughly 24–25 statements per admission versus 11–12 in Non-Surgical cases, while supported content remains similar in absolute count. The additional Surgical content is dominated by statements that are not directly stated in the record or require clinically plausible extrapolation. Through this analysis, we advocate for better grounding and completeness evaluations at a fine-grained level, establishing a foundational step toward safer and more reliable discharge-instruction generation.
2025
Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning
Justin Xu | Yiming Li | Zizheng Zhang | Augustine Yui Hei Luk | Mayank Jobanputra | Samarth Oza | Ashley Murray | Meghana Reddy Kasula | Andrew Parker | David W Eyre
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Justin Xu | Yiming Li | Zizheng Zhang | Augustine Yui Hei Luk | Mayank Jobanputra | Samarth Oza | Ashley Murray | Meghana Reddy Kasula | Andrew Parker | David W Eyre
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) can produce fluent but factually incorrect outputs and often have limited ability to attribute their claims to source material. This undermines their reliability, particularly in multi-hop and high-stakes domains such as medicine. We propose Tree-of-Quote (ToQ) prompting, a structured framework that decomposes complex questions into subquestions, generates quotes to support each step without retrieval, and selectively advances reasoning based on quote quality. We also introduce FQ-Score, a unified metric that captures answer correctness, attribution fidelity, and reasoning quality. Experiments on StrategyQA, 2WikiMultiHopQA, MuSiQue, MoreHopQA, and MedQA demonstrate that ToQ improves factuality and attribution over standard prompting baselines. To validate FQ-Score as a proxy for human judgment, we conduct two reader studies with clinicians on medical questions, and observe strong correlations. Both clinician scores and FQ-Scores also indicate a preference for ToQ over baselines due to a combination of greater correctness, completeness, and logical flow. Our results suggest ToQ is a promising approach for building more trustworthy and auditable LLM systems.