Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Yeonjun In; Wonjoong Kim; Kanghoon Yoon; Sungchul Kim; Mehrab Tanjim; Sangwu Park; Kibum Kim; Chanyoung Park

doi:10.18653/v1/2025.findings-emnlp.353

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Sangwu Park, Kibum Kim, Chanyoung Park

Abstract

As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SafeBench, a benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 20 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety.

Anthology ID:: 2025.findings-emnlp.353
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6652–6671
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.353/
DOI:: 10.18653/v1/2025.findings-emnlp.353
Bibkey:
Cite (ACL):: Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Sangwu Park, Kibum Kim, and Chanyoung Park. 2025. Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 6652–6671, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models (In et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.353.pdf
Checklist:: 2025.findings-emnlp.353.checklist.pdf

PDF Cite Search Checklist Fix data