Xuanren Chen


2026

Patient–trial retrieval is a challenging problem that requires nuanced clinical reasoning beyond surface-level semantic similarity. However, scarce and costly relevance annotations force existing approaches to rely on very limited supervision or zero-shot transfer, reducing the task to generic semantic matching and failing to capture multi-factor eligibility reasoning. To this end, we propose FACTrial, a factorized contrastive training framework that leverages LLMs to synthesize diagnosis-aware supervision for scalable patient–trial retrieval. FACTrial decomposes each patient note into a primary diagnosis and a set of concomitant, eligibility-triggering conditions, and constructs complementary contrastive signals through structured trial augmentation. Specifically, we generate primary-target and concomitant-target positives, together with clinically confusable near-miss negatives, to enforce diagnostic specificity under contrastive learning. Two specialized bi-encoder experts are trained to balance primary-diagnosis prioritization and concomitant-driven recall, and fused into a single deployable retriever. Experiments on three public benchmarks demonstrate that FACTrial achieves state-of-the-art performance, improving both top-ranked quality and high-recall coverage.