Jiaxing Xu
2025
Crab: A Novel Configurable Role-Playing LLM with Assessing Benchmark
Kai He
|
Yucheng Huang
|
Wenqing Wang
|
Delong Ran
|
Dongming Sheng
|
Junxuan Huang
|
Qika Lin
|
Jiaxing Xu
|
Wenqiang Liu
|
Mengling Feng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This study introduces Crab, a novel Configurable Role-Playing (RP) LLM with Assessing Benchmark, which consists of Role-Centric Dataset Curation, Persona-Embodying LLM Construction, and Comprehensive Benchmark Creation for RP dialogue generation. Distinct from traditional RP models that employ only several preset roles, Crab enables dynamic configuration of desired roles, thereby enhancing related flexibility and adaptability. To effectively train RP-LLMs, we curated the largest RP training dataset. The dataset provides a detailed role overview for each dialogue, including character profile, conversation scenario, and tagged topic, capturing a broad range of role-based behaviors, emotions, and interactions. We also noticed that current benchmarks lack both proper evaluation standards and methods. Thus, to validate RP-LLMs’ effectiveness, we introduced a new benchmark containing an evaluation standard, a test dataset with manual annotations, and a reward model RoleRM designed to automatically assess specific aspects of RP while aligning with human perception. Sufficient experiments reveal that RoleRM significantly outperforms ChatGPT and other evaluation methods in conducting fine-grained evaluations of RP. Also, RP-LLMs powered by Crab demonstrate superior performance across various fine-grained aspects.
Search
Fix author
Co-authors
- Mengling Feng 1
- Kai He 1
- Yucheng Huang 1
- Junxuan Huang 1
- Qika Lin 1
- show all...
Venues
- acl1