imapScore: Medical Fact Evaluation Made Easy

Huimin Wang; Yutian Zhao; Xian Wu; Yefeng Zheng

doi:10.18653/v1/2024.findings-acl.610

imapScore: Medical Fact Evaluation Made Easy

Huimin Wang, Yutian Zhao, Xian Wu, Yefeng Zheng

Abstract

Automatic evaluation of natural language generation (NLG) tasks has gained extensive research interests, since it can rapidly assess the performance of large language models (LLMs). However, automatic NLG evaluation struggles with medical QA because it fails to focus on the crucial correctness of medical facts throughout the generated text. To address this, this paper introduces a new data structure, imap, designed to capture key information in questions and answers, enabling evaluators to focus on essential details. The imap comprises three components: Query, Constraint, and Inform, each of which is in the form of term-value pairs to represent medical facts in a structural manner. We then introduce imapScore, which compares the corresponding medical term-value pairs in the imap to score generated texts. We utilize GPT-4 to extract imap from questions, human-annotated answers, and generated responses. To mitigate the diversity in medical terminology for fair term-value pairs comparison, we use a medical knowledge graph to assist GPT-4 in determining matches. To compare imapScore with existing NLG metrics, we establish a new benchmark dataset. The experimental results show that imapScore consistently outperforms state-of-the-art metrics, demonstrating an average improvement of 79.8% in correlation with human scores. Furthermore, incorporating imap into n-gram, embedding, and LLM metrics boosts the base versions, increasing correlation with human scores by averages of 89.9%, 81.7%, and 32.6%, respectively.

Anthology ID:: 2024.findings-acl.610
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10242–10257
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2024.findings-acl.610/
DOI:: 10.18653/v1/2024.findings-acl.610
Bibkey:
Cite (ACL):: Huimin Wang, Yutian Zhao, Xian Wu, and Yefeng Zheng. 2024. imapScore: Medical Fact Evaluation Made Easy. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10242–10257, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: imapScore: Medical Fact Evaluation Made Easy (Wang et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2024.findings-acl.610.pdf

PDF Cite Search Fix data