Xunde Dong

2026

The critical therapist shortage demands scalable training solutions. Standardized Patients, the gold standard, are scarce and costly. Current LLM-based approaches focus on patient simulation for conversational realism but lack pedagogical rigor as Virtual Standardized Patients, lacking faithful reactions to clinical errors and explainable feedback. To bridge this gap, we propose PUPPET, the first neural-symbolic Virtual Standardized Patient governed by an OBSERVE-THINK-BEHAVE architecture. PUPPET externalizes LLM reasoning into a symbolic system where experts implant causal associations between intervention logic (propositional logic) and patient mental states (state machine). This allows PUPPET to behave coherently with controllable and explainable psychological dynamics: intervention logic (OBSERVE) → state transition (THINK) → response (BEHAVE). Our PUPPET-TRAINER further leverages this chain to educate trainees about intervention consequences, standardizing and scaling mental health training. Experiments across three clinical scenarios confirm that PUPPET outperforms baselines in clinical faithfulness and pedagogical value.

2024

pdf bib abs

CLGSI: A Multimodal Sentiment Analysis Framework based on Contrastive Learning Guided by Sentiment Intensity
Yang Yang | Xunde Dong | Yupeng Qiang
Findings of the Association for Computational Linguistics: NAACL 2024

Recently, contrastive learning has begun to gain popularity in multimodal sentiment analysis (MSA). However, most of existing MSA methods based on contrastive learning lacks more detailed learning of the distribution of sample pairs with different sentiment intensity differences in the contrastive learning representation space. In addition, limited research has been conducted on the fusion of each modality representation obtained by contrastive learning training.In this paper, we propose a novel framework for multimodal sentiment analysis based on Contrastive Learning Guided by Sentiment Intensity (CLGSI). Firstly, the proposed contrastive learning guided by sentiment intensity selects positive and negative sample pairs based on the difference in sentiment intensity and assigns corresponding weights accordingly.Subsequently, we propose a new multimodal representation fusion mechanism, called Global-Local-Fine-Knowledge (GLFK), which extracts common features between different modalities’ representations. At the same time, each unimodal encoder output is separately processed by a Multilayer Perceptron (MLP) to extract specific features of each modality. Finally, joint learning of the common and specific features is used to predict sentiment intensity. The effectiveness of CLGSI is assessed on two English datasets, MOSI and MOSEI, as well as one Chinese dataset, SIMS. We achieve competitive experimental results, which attest to the strong generalization performance of our approach. The code for our approach will be released in https://github.com/AZYoung233/CLGSI

Co-authors

Chen Xu 1

Yang Yi 1

Venues

ACL1
Findings1

Fix author