Dynamic Evaluation with Cognitive Reasoning for Multi-turn Safety of Large Language Models

Lanxue Zhang, Yanan Cao, Yuqiang Xie, Fang Fang, Yangxi Li


Abstract
The rapid advancement of Large Language Models (LLMs) poses significant challenges for safety evaluation. Current static datasets struggle to identify emerging vulnerabilities due to three limitations: (1) they risk being exposed in model training data, leading to evaluation bias; (2) their limited prompt diversity fails to capture real-world application scenarios; (3) they are limited to provide human-like multi-turn interactions. To address these limitations, we propose a dynamic evaluation framework, CogSafe, for comprehensive and automated multi-turn safety assessment of LLMs. We introduce CogSafe based on cognitive theories to simulate the real chatting process. To enhance assessment diversity, we introduce scenario simulation and strategy decision to guide the dynamic generation, enabling coverage of application situations. Furthermore, we incorporate the cognitive process to simulate multi-turn dialogues that reflect the cognitive dynamics of real-world interactions. Extensive experiments demonstrate the scalability and effectiveness of our framework, which has been applied to evaluate the safety of widely used LLMs.
Anthology ID:
2025.acl-long.963
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19588–19608
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.963/
DOI:
Bibkey:
Cite (ACL):
Lanxue Zhang, Yanan Cao, Yuqiang Xie, Fang Fang, and Yangxi Li. 2025. Dynamic Evaluation with Cognitive Reasoning for Multi-turn Safety of Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 19588–19608, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Dynamic Evaluation with Cognitive Reasoning for Multi-turn Safety of Large Language Models (Zhang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.963.pdf