Lixian Lai
2025
Should I Believe in What Medical AI Says? A Chinese Benchmark for Medication Based on Knowledge and Reasoning
Yue Wu
|
Yangmin Huang
|
Qianyun Du
|
Lixian Lai
|
Zhiyang He
|
Jiaxue Hu
|
Xiaodong Tao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Large language models (LLMs) show potential in healthcare but often generate hallucinations, especially when handling unfamiliar information. In medication, a systematic benchmark to evaluate model capabilities is lacking, which is critical given the high-risk nature of medical information. This paper introduces a Chinese benchmark aimed at assessing models in medication tasks, focusing on knowledge and reasoning across six datasets: indication, dosage and administration, contraindicated population, mechanisms of action, drug recommendation, and drug interaction. We evaluate eight closed-source and five open-source models to identify knowledge boundaries, providing the first systematic analysis of limitations and risks in proprietary medical models.
Search
Fix author
Co-authors
- Qianyun Du 1
- Zhiyang He 1
- Jiaxue Hu 1
- Yangmin Huang 1
- Xiaodong Tao 1
- show all...
- Yue Wu 1
Venues
- acl1