Detection, Diagnosis, and Explanation: A Benchmark for Chinese Medial Hallucination Evaluation

Chengfeng Dou; Ying Zhang; Yanyuan Chen; Zhi Jin; Wenpin Jiao; Haiyan Zhao; Yu Huang

Detection, Diagnosis, and Explanation: A Benchmark for Chinese Medial Hallucination Evaluation

Chengfeng Dou, Ying Zhang, Yanyuan Chen, Zhi Jin, Wenpin Jiao, Haiyan Zhao, Yu Huang

Abstract

Large Language Models (LLMs) have made significant progress recently. However, their practical use in healthcare is hindered by their tendency to generate hallucinations. One specific type, called snowballing hallucination, occurs when LLMs encounter misleading information, and poses a security threat to LLMs. To understand how well LLMs can resist these hallucination, we create the Chinese Medical Hallucination Evaluation benchmark (CMHE). This benchmark can be used to evaluate LLMs’ ability to detect medical hallucinations, make accurate diagnoses in noisy conditions, and provide plausible explanations. The creation of this benchmark involves a combination of manual and model-based approaches. In addition, we use ICD-10 as well as MeSH, two specialized glossaries, to aid in the evaluation. Our experiments show that the LLM struggles to identify fake medical terms and makes poor diagnoses in distracting environments. However, improving the model’s understanding of medical concepts can help it resist interference to some extent.

Anthology ID:: 2024.lrec-main.428
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 4784–4794
Language:
URL:: https://aclanthology.org/2024.lrec-main.428
DOI:
Bibkey:
Cite (ACL):: Chengfeng Dou, Ying Zhang, Yanyuan Chen, Zhi Jin, Wenpin Jiao, Haiyan Zhao, and Yu Huang. 2024. Detection, Diagnosis, and Explanation: A Benchmark for Chinese Medial Hallucination Evaluation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4784–4794, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Detection, Diagnosis, and Explanation: A Benchmark for Chinese Medial Hallucination Evaluation (Dou et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2024-05/2024.lrec-main.428.pdf
Optional supplementary material:: 2024.lrec-main.428.OptionalSupplementaryMaterial.zip

PDF Search Optional supplementary material