Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context

Zhihao Zhang, Liting Huang, Guanghao Wu, Preslav Nakov, Heng Ji, Usman Naseem


Abstract
Safety alignment in Large Language Models is critical for healthcare; however, reliance on binary refusal boundaries often results in over-refusal of benign queries or unsafe compliance with harmful ones. While existing benchmarks measure these extremes, they fail to evaluate Safe Completion: the model’s ability to maximise helpfulness on dual-use or borderline queries by providing safe, high-level guidance without crossing into actionable harm. We introduce Health-ORSC-Bench, the first large-scale benchmark designed to systematically measure Over-Refusal and Safe Completion quality in healthcare. Comprising 31,920 benign boundary prompts across seven health categories (e.g., self-harm, medical misinformation), our framework uses an automated pipeline with human validation to test models at varying levels of intent ambiguity. We evaluate 30 state-of-the-art LLMs, including GPT-5 and Claude-4, revealing a significant tension: safety-optimised models frequently refuse up to 80% of "Hard" benign prompts, while domain-specific models often sacrifice safety for utility. Our findings demonstrate that model family and size significantly influence calibration: larger frontier models (e.g., GPT-5, Llama-4) exhibit "safety-pessimism" and higher over-refusal than smaller or MoE-based counterparts (e.g., Qwen-3-Next), highlighting that current LLMs struggle to balance refusal and compliance. Health-ORSC-Bench provides a rigorous standard for calibrating the next generation of medical AI assistants toward nuanced, safe, and helpful completions. Our code and data is available at: https://github.com/ZhihaoZhang97/Health-ORSC-Bench. Warning: Some contents may include toxic or undesired contents.
Anthology ID:
2026.findings-acl.1177
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23525–23547
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1177/
DOI:
Bibkey:
Cite (ACL):
Zhihao Zhang, Liting Huang, Guanghao Wu, Preslav Nakov, Heng Ji, and Usman Naseem. 2026. Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23525–23547, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context (Zhang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1177.pdf
Checklist:
 2026.findings-acl.1177.checklist.pdf