MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics

Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, Mengyue Wu


Abstract
Large language models (LLMs) demonstrate significant potential in advancing medical applications, yet their capabilities in addressing medical ethics challenges remain underexplored. This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. Our framework encompasses two key components: knowledge, assessing the models’ grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios. To support this benchmark, we consulted with medical ethics researchers and developed three datasets addressing distinct ethical challenges: blatant violations of medical ethics, priority dilemmas with clear inclinations, and equilibrium dilemmas without obvious resolutions. MedEthicEval serves as a critical tool for understanding LLMs’ ethical reasoning in healthcare, paving the way for their responsible and effective use in medical contexts.
Anthology ID:
2025.naacl-industry.34
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
404–421
Language:
URL:
https://preview.aclanthology.org/corrections-2025-06/2025.naacl-industry.34/
DOI:
10.18653/v1/2025.naacl-industry.34
Bibkey:
Cite (ACL):
Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, and Mengyue Wu. 2025. MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 404–421, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics (Jin et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-06/2025.naacl-industry.34.pdf