Ruishi Zou
2025
Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Chaoran Chen
|
Bingsheng Yao
|
Ruishi Zou
|
Wenyue Hua
|
Weimin Lyu
|
Toby Jia-Jun Li
|
Dakuo Wang
Findings of the Association for Computational Linguistics: ACL 2025
Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs.This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024.Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature.Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.
2024
More Samples or More Prompts? Exploring Effective Few-Shot In-Context Learning for LLMs with In-Context Sampling
Bingsheng Yao
|
Guiming Chen
|
Ruishi Zou
|
Yuxuan Lu
|
Jiachen Li
|
Shao Zhang
|
Yisi Sang
|
Sijia Liu
|
James Hendler
|
Dakuo Wang
Findings of the Association for Computational Linguistics: NAACL 2024
While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM’s performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs’ performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM’s performance, which sheds light on a new yet promising future research direction.
Search
Fix author
Co-authors
- Dakuo Wang 2
- Bingsheng Yao 2
- Guiming Chen 1
- Chaoran Chen 1
- James Hendler 1
- show all...