MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics
Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, Mengyue Wu
Abstract
Large language models (LLMs) demonstrate significant potential in advancing medical applications, yet their capabilities in addressing medical ethics challenges remain underexplored. This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. Our framework encompasses two key components: knowledge, assessing the models’ grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios. To support this benchmark, we consulted with medical ethics researchers and developed three datasets addressing distinct ethical challenges: blatant violations of medical ethics, priority dilemmas with clear inclinations, and equilibrium dilemmas without obvious resolutions. MedEthicEval serves as a critical tool for understanding LLMs’ ethical reasoning in healthcare, paving the way for their responsible and effective use in medical contexts.- Anthology ID:
- 2025.naacl-industry.34
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 404–421
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-06/2025.naacl-industry.34/
- DOI:
- 10.18653/v1/2025.naacl-industry.34
- Cite (ACL):
- Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, and Mengyue Wu. 2025. MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 404–421, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics (Jin et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-06/2025.naacl-industry.34.pdf