Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences

Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, Maarten Sap


Abstract
Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience. Through a study of 480 participants evaluating 3,840 query-response pairs, we examine how different refusal strategies affect user perceptions across varying motivations. Our findings reveal that response strategy largely shapes user experience, while actual user motivation has negligible impact. Partial compliance—providing general information without actionable details—emerges as the optimal strategy, reducing negative user perceptions by over 50% to flat-out refusals. Complementing this, we analyze response patterns of 9 state-of-the-art LLMs and evaluate how 6 reward models score different refusal strategies, demonstrating that models rarely deploy partial compliance naturally and reward models currently undervalue it. This work demonstrates that effective guardrails require focusing on crafting thoughtful refusals rather than detecting intent, offering a path toward AI safety mechanisms that ensure both safety and sustained user engagement.
Anthology ID:
2025.findings-emnlp.630
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11742–11772
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.630/
DOI:
10.18653/v1/2025.findings-emnlp.630
Bibkey:
Cite (ACL):
Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, and Maarten Sap. 2025. Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11742–11772, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences (Zheng et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.630.pdf
Checklist:
 2025.findings-emnlp.630.checklist.pdf