2025
pdf
bib
abs
VeReaFine: Iterative Verification Reasoning Refinement RAG for Hallucination-Resistant on Open-Ended Clinical QA
Pakawat Phasook
|
Rapepong Pitijaroonpong
|
Jiramet Kinchagawat
|
Amrest Chinkamol
|
Tossaporn Saengja
|
Kiartnarin Udomlapsakul
|
Jitkapat Sawatphol
|
Piyalitt Ittichaiwong
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)
We present VeReaFine, a novel “Verifier-RAG” pipeline designed to eliminate hallucinations in open-ended clinical question answering. VeReaFine interleaves three tightly coupled stages—retrieval, verification, and generation—across up to three iterations. First, a two-stage dense retriever (BM-Retriever-410M → BM-Reranker-2B) fetches and ranks top-k biomedical passages; an 8B-parameter MedReason verifier then filters these for direct relevance and identifies missing evidence. When the verifier deems the context insufficient, it formulates a focused “feedback query” to retrieve additional passages (bounded to prevent infinite loops). Once a minimal ground-truth context is assembled, a 7B-parameter generator (Qwen2.5-7B-Instruct) drafts an answer purely from that vetted context, and the verifier performs a final check—prompting the generator to refine any remaining unsupported claims. By iteratively fetching only missing facts and ensuring every assertion is evidence-backed, VeReaFine achieves monotonic factuality improvements with minimal overhead. On the BioNLP 2025 ClinIQLink “LLM Lie-Detector” shared task, our 7B generator augmented with VeReaFine matches or surpasses a 32B medical model on open-ended reasoning metrics, reducing multi-hop inverse step-identification errors by 26%. These findings demonstrate that moderate-size LLMs, when guided by targeted verification loops, can deliver expert-level reliability in clinical QA.
2024
pdf
bib
abs
SICAR at RRG2024: GPU Poor’s Guide to Radiology Report Generation
Kiartnarin Udomlapsakul
|
Parinthapat Pengpun
|
Tossaporn Saengja
|
Kanyakorn Veerakanjana
|
Krittamate Tiankanon
|
Pitikorn Khlaisamniang
|
Pasit Supholkhan
|
Amrest Chinkamol
|
Pubordee Aussavavirojekul
|
Hirunkul Phimsiri
|
Tara Sripo
|
Chiraphat Boonnag
|
Trongtum Tongdee
|
Thanongchai Siriapisith
|
Pairash Saiviroonporn
|
Jiramet Kinchagawat
|
Piyalitt Ittichaiwong
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Our solution employs a lightweight multimodal language model (MLLM) enhanced with a two-stage post-processing strategy, utilizing a Large Language Model (LLM) to boost diagnostic accuracy and ensure patient safety. We introduce the “First, Do No Harm” SafetyNet, which incorporates Xraydar, an advanced X-ray classification model, to cross-verify the model outputs and specifically address false negatives from the MLLM. This comprehensive approach combines the efficiency of lightweight models with the robustness of thorough post-processing techniques, offering a reliable solution for radiology report generation. Our system achieved fourth place on the F1-Radgraph metric for findings generation in the Radiology Report Generation Shared Task (RRG24).
pdf
bib
abs
On Creating an English-Thai Code-switched Machine Translation in Medical Domain
Parinthapat Pengpun
|
Krittamate Tiankanon
|
Amrest Chinkamol
|
Jiramet Kinchagawat
|
Pitchaya Chairuengjitjaras
|
Pasit Supholkhan
|
Pubordee Aussavavirojekul
|
Chiraphat Boonnag
|
Kanyakorn Veerakanjana
|
Hirunkul Phimsiri
|
Boonthicha Sae-jia
|
Nattawach Sataudom
|
Piyalitt Ittichaiwong
|
Peerat Limkonchotiwat
Findings of the Association for Computational Linguistics: EMNLP 2024
Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.