Donghao Li
2026
CFlowPsyD: An Analysis-Enhanced Dataset for Asynchronous Psychological Counseling through Self-Optimizing Multi-Agent Framework
Donghao Li | Yifan Deng | Jinta Weng | Xingsheng Zhang | Chenxu Niu | Jingyuan Tian | Heyan Huang | Yue Hu
Findings of the Association for Computational Linguistics: ACL 2026
Donghao Li | Yifan Deng | Jinta Weng | Xingsheng Zhang | Chenxu Niu | Jingyuan Tian | Heyan Huang | Yue Hu
Findings of the Association for Computational Linguistics: ACL 2026
Asynchronous psychological counseling (APC) represents a crucial mental health service modality that transcends temporal and spatial constraints. However, its development faces significant data scarcity challenges: due to stringent privacy protection requirements and professional ethical considerations, direct collection of conversational data from authentic APC scenarios is virtually impossible. To address this challenge, we design a self-optimizing multi-agent framework for counseling dialogue generation, CFlowPsy, which utilizes a small amount of real anonymized counseling cases as seed data to synthesize diverse problem-solving-oriented APC conversations through large language models. Specifically, the framework employs a Persona-Flow module to continuously track and update clients’ basic information, real-time emotions, and counseling progress, providing dynamic personalized analytical support for counselors and enabling self-optimization of generated dialogues. Simultaneously, the framework ensures that generated interventions contain explicit reasoning processes, demonstrating clear psychological analysis and logic, thereby enhancing the accuracy and consistency of responses. Under this framework, we develop the first Chinese APC dataset, CFlowPsyD, comprising 1,700 high-quality extended conversations. Extensive experiments and human evaluations confirm that the proposed CFlowPsyD dataset successfully simulates human-like APC processes.
2024
PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models
Haoran Li | Dadi Guo | Donghao Li | Wei Fan | Qi Hu | Xin Liu | Chunkit Chan | Duanyi Yao | Yuan Yao | Yangqiu Song
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haoran Li | Dadi Guo | Donghao Li | Wei Fan | Qi Hu | Xin Liu | Chunkit Chan | Duanyi Yao | Yuan Yao | Yangqiu Song
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent works propose privacy-preserving language models (PPLMs) with differential privacy (DP). Unfortunately, different DP implementations make it challenging for a fair comparison among existing PPLMs. In this paper, we present PrivLM-Bench, a multi-perspective privacy evaluation benchmark to empirically and intuitively quantify the privacy leakage of LMs. Instead of only reporting DP parameters, PrivLM-Bench sheds light on the neglected inference data privacy during actual usage. PrivLM-Bench first clearly defines multi-faceted privacy objectives. Then, PrivLM-Bench constructs a unified pipeline to perform private fine-tuning. Lastly, PrivLM-Bench performs existing privacy attacks on LMs with pre-defined privacy objectives as the empirical evaluation results. The empirical attack results are used to fairly and intuitively evaluate the privacy leakage of various PPLMs. We conduct extensive experiments on three datasets of GLUE for mainstream LMs.