Shang Wu
2025
ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents
Zhigen Li
|
Jianxiang Peng
|
Yanmeng Wang
|
Yong Cao
|
Tianhao Shen
|
Minghui Zhang
|
Linxi Su
|
Shang Wu
|
Yihang Wu
|
YuQian Wang
|
Ye Wang
|
Wei Hu
|
Jianfeng Li
|
Shaojun Wang
|
Jing Xiao
|
Deyi Xiong
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dialogue agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their **lack of controllability** remains a key challenge, often leading to unfocused conversations or task failure. To address this, we introduce Standard Operating Procedure (SOP) to regulate dialogue flow. Specifically, we propose **ChatSOP**, a novel SOP-guided Monte Carlo Tree Search (MCTS) planning framework designed to enhance the controllability of LLM-driven dialogue agents. To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes SOP-guided Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.
2024
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
Linhao Yu
|
Yongqi Leng
|
Yufei Huang
|
Shang Wu
|
Haixin Liu
|
Xinmeng Ji
|
Jiahui Zhao
|
Jinwang Song
|
Tingting Cui
|
Xiaoqing Cheng
|
Liutao Liutao
|
Deyi Xiong
Findings of the Association for Computational Linguistics: ACL 2024
What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on morality. With these sources, we aim to create a moral evaluation dataset characterized by diversity and authenticity. We develop a morality taxonomy and a set of fundamental moral principles that are not only rooted in traditional Chinese culture but also consistent with contemporary societal norms. To facilitate efficient construction and annotation of instances in CMoralEval, we establish a platform with AI-assisted instance generation to streamline the annotation process. These help us curate CMoralEval that encompasses both explicit moral scenarios (14,964 instances) and moral dilemma scenarios (15,424 instances), each with instances from different data sources. We conduct extensive experiments with CMoralEval to examine a variety of Chinese LLMs. Experiment results demonstrate that CMoralEval is a challenging benchmark for Chinese LLMs.
Search
Fix author
Co-authors
- Deyi Xiong 2
- Yong Cao 1
- Xiaoqing Cheng 1
- Tingting Cui 1
- Wei Hu (胡纬) 1
- show all...