Roland Hangelbroek
2025
TripleCheck: Transparent Post-Hoc Verification of Biomedical Claims in AI-Generated Answers
Ana Valeria González
|
Sidsel Boldsen
|
Roland Hangelbroek
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
Retrieval Augmented Generation (RAG) has advanced Question Answering (QA) by connecting Large Language Models (LLMs) to external knowledge. However, these systems can still produce answers that are unsupported, lack clear traceability, or misattribute information — a critical issue in the biomedical domain where accuracy, trust and control are essential. We introduce TripleCheck, a post-hoc framework that breaks down an LLM’s answer into factual triples and checks each against both the retrieved context and a biomedical knowledge graph. By highlighting which statements are supported, traceable, or correctly attributed, TripleCheck enables users to spot gaps, unsupported claims, and misattributions, prompting more careful follow up. We present the TripleCheck framework, evaluate it on the SciFact benchmark, analyze its limitations, and share preliminary expert feedback. Results show that TripleCheck provides nuanced insight, potentially supporting greater trust and safer AI adoption in biomedical applications.