Reasoning-Enhanced Retrieval for Misconception Prediction: A RAG-Inspired Approach with LLMs

Chaudhary Divya, Chang Xue, Shaorui Sun


Abstract
Large language models (LLMs) are increasingly deployed in clinical decision support, yet subtle demographic cues can influence their reasoning. Prior work has documented disparities in outputs across patient groups, but little is known about how internal reasoning shifts under controlled demographic changes. We introduce MEDEQUALQA, a counterfactual benchmark that perturbs only patient pronouns (he/him, she/her, they/them) while holding critical symptoms and conditions (CSCs) constant. Each vignette is expanded into single-CSC ablations, producing three parallel datasets of approximately 23k items each (69k total). We evaluate a frontier LLM and compute Semantic Textual Similarity (STS) between reasoning traces to measure stability across pronoun variants. Our results show overall high similarity (mean STS > 0.80) but reveal consistent localized divergences in cited risk factors, guideline anchors, and differential ordering—even when final diagnoses remain unchanged. Error analysis identifies specific cases where reasoning shifts occur, highlighting clinically relevant bias loci that may cascade into inequitable care. MEDEQUALQA provides a controlled diagnostic setting for auditing reasoning stability in medical AI.
Anthology ID:
2025.sciprodllm-1.5
Volume:
Proceedings of The First Workshop on Human–LLM Collaboration for Ethical and Responsible Science Production (SciProdLLM)
Month:
December
Year:
2025
Address:
Mumbai, India (Hybrid)
Editors:
Wei Zhao, Jennifer D’Souza, Steffen Eger, Anne Lauscher, Yufang Hou, Nafise Sadat Moosavi, Tristan Miller, Chenghua Lin
Venues:
SciProdLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–51
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.sciprodllm-1.5/
DOI:
Bibkey:
Cite (ACL):
Chaudhary Divya, Chang Xue, and Shaorui Sun. 2025. Reasoning-Enhanced Retrieval for Misconception Prediction: A RAG-Inspired Approach with LLMs. In Proceedings of The First Workshop on Human–LLM Collaboration for Ethical and Responsible Science Production (SciProdLLM), pages 38–51, Mumbai, India (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Reasoning-Enhanced Retrieval for Misconception Prediction: A RAG-Inspired Approach with LLMs (Divya et al., SciProdLLM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.sciprodllm-1.5.pdf