Crab: A Novel Configurable Role-Playing LLM with Assessing Benchmark

Kai He, Yucheng Huang, Wenqing Wang, Delong Ran, Dongming Sheng, Junxuan Huang, Qika Lin, Jiaxing Xu, Wenqiang Liu, Mengling Feng


Abstract
This study introduces Crab, a novel Configurable Role-Playing (RP) LLM with Assessing Benchmark, which consists of Role-Centric Dataset Curation, Persona-Embodying LLM Construction, and Comprehensive Benchmark Creation for RP dialogue generation. Distinct from traditional RP models that employ only several preset roles, Crab enables dynamic configuration of desired roles, thereby enhancing related flexibility and adaptability. To effectively train RP-LLMs, we curated the largest RP training dataset. The dataset provides a detailed role overview for each dialogue, including character profile, conversation scenario, and tagged topic, capturing a broad range of role-based behaviors, emotions, and interactions. We also noticed that current benchmarks lack both proper evaluation standards and methods. Thus, to validate RP-LLMs’ effectiveness, we introduced a new benchmark containing an evaluation standard, a test dataset with manual annotations, and a reward model RoleRM designed to automatically assess specific aspects of RP while aligning with human perception. Sufficient experiments reveal that RoleRM significantly outperforms ChatGPT and other evaluation methods in conducting fine-grained evaluations of RP. Also, RP-LLMs powered by Crab demonstrate superior performance across various fine-grained aspects.
Anthology ID:
2025.acl-long.731
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15030–15052
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.731/
DOI:
Bibkey:
Cite (ACL):
Kai He, Yucheng Huang, Wenqing Wang, Delong Ran, Dongming Sheng, Junxuan Huang, Qika Lin, Jiaxing Xu, Wenqiang Liu, and Mengling Feng. 2025. Crab: A Novel Configurable Role-Playing LLM with Assessing Benchmark. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15030–15052, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Crab: A Novel Configurable Role-Playing LLM with Assessing Benchmark (He et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.731.pdf