Towards semantic reliable clinical QA: Query pipeline optimization for cancer patient question answering systems

MaoLin He, Rena Wei Gao, Mike Conway, Brian E. Chapman


Abstract
Large Language Models (LLMs) show promise in medical Question-Answering (QA) but suffer from hallucinations that jeopardize patient safety. While Retrieval-Augmented Generation (RAG) mitigates this by grounding outputs in external evidence, existing pipelines struggle with the complex, rapidly evolving nature of oncology. We present **CoMeta**, a three-level controllable metadata-aware framework optimized for Cancer Patient QA (CPQA). We introduce Clinical Hybrid Semantic-Symbolic Document Retrieval (CHSDR), which synergizes real-time Boolean search via NCBI E-Utilities with semantic retrieval to overcome metadata blindness. Additionally, we propose Semantic Enhanced Overlap Segmentation (SEOS) to prevent contextual fragmentation. Our results demonstrate that CHSDR significantly improves retrieval performance, CoMeta improved the answer accuracy of Claude-3-haiku by 5.24% over chain-of-thought prompting and about 3% over a naive RAG setup. This study highlights the importance of domain-specific query optimization in realizing the full potential of RAG and provides a robust framework for building more reliable CPQA systems.
Anthology ID:
2026.findings-acl.1429
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28624–28637
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1429/
DOI:
Bibkey:
Cite (ACL):
MaoLin He, Rena Wei Gao, Mike Conway, and Brian E. Chapman. 2026. Towards semantic reliable clinical QA: Query pipeline optimization for cancer patient question answering systems. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28624–28637, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Towards semantic reliable clinical QA: Query pipeline optimization for cancer patient question answering systems (He et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1429.pdf
Checklist:
 2026.findings-acl.1429.checklist.pdf