Budget-Aware Routing for Long Clinical Text

Khizar Qureshi, Geoffrey Martin, Yifan Peng


Abstract
A key challenge for large language models is token cost per query and overall deployment cost. Clinical inputs are long, heterogeneous, and often redundant, while downstream tasks are short and high stakes. We study budgeted context selection, where a subset of document units is chosen under a strict token budget so an off-the-shelf generator can meet fixed cost and latency constraints. We cast this as a knapsack-constrained subset selection problem with two design choices, unitization that defines document segmentation and selection that determines which units are kept.We propose RCD, a monotone submodular objective that balances relevance, coverage, and diversity. We compare sentence, section, window, and cluster-based unitization, and introduce a routing heuristic that adapts to the budget regime. Experiments on MIMIC discharge notes, Cochrane abstracts, and L-Eval show that optimal strategies depend on the evaluation setting. Positional heuristics perform best at low budgets in extractive tasks, while diversity-aware methods such as MMR improve LLM generation. Selector choice matters more than unitization, with cluster-based grouping reducing performance and other schemes behaving similarly. ROUGE saturates for LLM summaries, while BERTScore better reflects quality differences.
Anthology ID:
2026.findings-acl.2114
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42583–42598
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2114/
DOI:
Bibkey:
Cite (ACL):
Khizar Qureshi, Geoffrey Martin, and Yifan Peng. 2026. Budget-Aware Routing for Long Clinical Text. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42583–42598, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Budget-Aware Routing for Long Clinical Text (Qureshi et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2114.pdf
Checklist:
 2026.findings-acl.2114.checklist.pdf