Yuxing Lu


2026

Large Language Models (LLMs) have demonstrated remarkable capabilities on general text; however, their proficiency in specialized scientific domains that require deep, interconnected knowledge remains largely uncharacterized. Metabolomics presents unique challenges with its complex biochemical pathways, heterogeneous identifier systems, and fragmented databases. To systematically evaluate LLM capabilities in this domain, we introduce MetaBench, the first benchmark for metabolomics assessment. Curated from authoritative public resources, MetaBench evaluates five capabilities essential for metabolomics research: knowledge, understanding, grounding, reasoning, and research. Our evaluation of 25 open- and closed-source LLMs reveals distinct performance patterns across metabolomics tasks: while models perform well on text generation tasks, cross-database identifier grounding remains challenging even with retrieval augmentation. Model performance also decreases on long-tail metabolites with sparse annotations. With MetaBench, we provide essential infrastructure for developing and evaluating metabolomics AI systems, enabling systematic progress toward reliable computational tools for metabolomics research.
Zero-Shot Composed Image Retrieval (ZS-CIR) retrieves target images using a reference image and modification text without task-specific training. Existing methods typically rely on MLLMs to generate query vectors with pre-trained models like CLIP. However, those constructed queries suffer from inherent cognitive bias due to unknown candidate distribution. We propose CoRR, a training-free framework that reframes ZS-CIR as a self-correcting process through bias-aware query refinement. CoRR uses retrieved results as feedback to perceive the candidate distribution. With carefully designed CoT prompting, the MLLM inspects the retrieved candidates to identify intent misalignments in the query and then corrects them via Historical Query Fusion. We also introduce Retrieval-Driven Caption Optimization to provide context-aligned examples, reducing phrasing and style mismatches. Experiments on public benchmarks show that CoRR significantly outperforms other SOTA methods.

2024

Large Language Models (LLMs) have revolutionized text generation across diverse domains, showcasing an ability to mimic human-like text with remarkable accuracy. Yet, these models frequently encounter a significant hurdle: producing hallucinations, a flaw particularly detrimental in the healthcare domain where precision is crucial. In this paper, we introduce ClinicalRAG, a novel multi-agent pipeline to rectify this issue by incorporating heterogeneous medical knowledge—both structured and unstructured—into LLMs to bolster diagnosis accuracy. ClinicalRAG can extract related medical entities from user inputs and dynamically integrate relevant medical knowledge during the text generation process. Comparative analyses reveal that ClinicalRAG significantly outperforms knowledge-deficient methods, offering enhanced reliability in clinical decision support. This advancement marks a pivotal proof-of-concept step towards mitigating misinformation risks in healthcare applications of LLMs.

2023

Artificial intelligence based diagnosis systems have emerged as powerful tools to reform traditional medical care. Each clinician now wants to have his own intelligent diagnostic partner to expand the range of services he can provide. When reading a clinical note, experts make inferences with relevant knowledge. However, medical knowledge appears to be heterogeneous, including structured and unstructured knowledge. Existing approaches are incapable of uniforming them well. Besides, the descriptions of clinical findings in clinical notes, which are reasoned to diagnosis, vary a lot for different diseases or patients. To address these problems, we propose a Medical Knowledge-enhanced Prompt Learning (MedKPL) model for diagnosis classification. First, to overcome the heterogeneity of knowledge, given the knowledge relevant to diagnosis, MedKPL extracts and normalizes the relevant knowledge into a prompt sequence. Then, MedKPL integrates the knowledge prompt with the clinical note into a designed prompt for representation. Therefore, MedKPL can integrate medical knowledge into the models to enhance diagnosis and effectively transfer learned diagnosis capacity to unseen diseases using alternating relevant disease knowledge. The experimental results on two medical datasets show that our method can obtain better medical text classification results and can perform better in transfer and few-shot settings among datasets of different diseases.