Hongbin Na
2026
Narrative Nexus at SemEval-2026 Task 4: Modeling Narrative Similarity via Instruction-Based Fine-Tuning and Synthetic Data Augmentation
Haotan Guo | Hongbin Na | Zimu Wang | Wei Wang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Haotan Guo | Hongbin Na | Zimu Wang | Wei Wang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Narrative similarity assessment requires models to reason beyond surface-level lexical overlap and capture higher-level plot structures and thematic relationships. In this paper, we address SemEval-2026 Task 4 Track A: Narrative Story Similarity by reformulating it as an instruction-following generation problem. We employ parameter-efficient fine-tuning via LoRA to adapt pretrained large language models for triplet-based narrative comparison. To overcome the limitations imposed by the scarcity of human-annotated data, we further incorporate synthetic triplet samples generated by a large language model for data augmentation. Experimental results demonstrate that our fine-tuned Qwen2.5-7B model achieves competitive performance, outperforming the zero-shot GPT-4o-mini baseline. These findings underscore the effectiveness of task-specific adaptation combined with synthetic data augmentation for narrative similarity modeling.
You Never Know a Person, You Only Know Their Defenses: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations
Hongbin Na | Zimu Wang | Zhaoming Chen | Peilin Zhou | Yining Hua | Grace Ziqi Zhou | Haiyang Zhang | Tao Shen | Wei Wang | John Torous | Shaoxiong Ji | Ling Chen
Findings of the Association for Computational Linguistics: ACL 2026
Hongbin Na | Zimu Wang | Zhaoming Chen | Peilin Zhou | Yining Hua | Grace Ziqi Zhou | Haiyang Zhang | Tao Shen | Wei Wang | John Torous | Shaoxiong Ji | Ling Chen
Findings of the Association for Computational Linguistics: ACL 2026
Psychological defenses are strategies, often automatic, that people use to manage distress. Rigid use or overuse of defenses is negatively linked to mental health and shapes what speakers disclose and how they accept or resist help. However, defenses are complex and difficult to reliably measure, particularly in clinical dialogues. We introduce PsyDefConv, a dialogue corpus with help seeker utterances labeled for defense level, and DMRS Co-Pilot, a four-stage pipeline that provides evidence-based pre-annotations. The corpus contains 200 dialogues and 4,709 utterances, including 2,336 help seeker turns, with double-blind labeling reaching Cohen’s kappa of 0.639. In a counterbalanced study, the co-pilot reduced average annotation time by 24.0%. In expert review, it averaged 4.62 for evidence supportiveness, 4.44 for clinical plausibility, and 4.40 for insight on a seven-point scale. Benchmarks with strong large language models (LLMs) in zero-shot and fine-tuning settings demonstrate clear headroom, with the best macro F1-score around 30% and a tendency to overpredict mature defenses. Corpus analyses confirm that mature defenses are most common and reveal emotion-specific deviations. We release the corpus, annotations, code, and prompts to support research on defensive functioning in language.
Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations
Hongbin Na | Zimu Wang | Zhaoming Chen | Yining Hua | Rena Gao | Kailai Yang | Ling Chen | Wei Wang | Shaoxiong Ji | John Torous | Sophia Ananiadou
BioNLP 2026
Hongbin Na | Zimu Wang | Zhaoming Chen | Yining Hua | Rena Gao | Kailai Yang | Ling Chen | Wei Wang | Shaoxiong Ji | John Torous | Sophia Ananiadou
BioNLP 2026
We present an overview of PsyDefDetect, the shared task on detecting levels of psychological defense mechanisms in emotional support dialogues, co-located with BioNLP@ACL 2026. Grounded in the clinically validated Defense Mechanism Rating Scales (DMRS) framework, the task asks systems to classify a target seeker utterance, given its preceding dialogue context, into one of nine categories: seven hierarchical DMRS levels plus two auxiliary labels. Participants worked on PsyDefConv, a newly released corpus of 200 dialogues and 2336 help-seeker utterances annotated under DMRS with substantial inter-annotator agreement. The task attracted 172 participants on CodaBench who produced 563 submissions, with 21 teams officially registering their results for the final ranking. The best system achieved a macro F1-score of 0.420, surpassing the strongest fine-tuned baseline reported in the dataset paper by a notable margin, yet leaving clear headroom. Our analysis highlights (i) a persistent tendency to over-predict the majority High-Adaptive class, (ii) a widening gap between accuracy and macro-F1 that reveals class-imbalance sensitivity, and (iii) the value of theory-aware and LLM-based approaches for fine-grained defensive-function classification. We release all task materials and invite the community to continue work on this novel intersection of clinical psychology and NLP.
2025
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song | Guangxian Ouyang | Meng Fang | Hongbin Na | Zijing Shi | Zhenhao Chen | Fu Yujie | Zeyu Zhang | Shiyu Jiang | Miao Fang | Ling Chen | Xiuying Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Zirui Song | Guangxian Ouyang | Meng Fang | Hongbin Na | Zijing Shi | Zhenhao Chen | Fu Yujie | Zeyu Zhang | Shiyu Jiang | Miao Fang | Ling Chen | Xiuying Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Existing household robots have made significant progress in performing routine tasks, such as cleaning floors or delivering objects. However, a key limitation of these robots is their inability to recognize potential problems or dangers in home environments. For example, a child may pick up and ingest medication that has fallen on the floor, posing a serious risk. We argue that household robots should proactively detect such hazards or anomalies within the home, and propose the task of anomaly scenario generation. To accomplish this task, we leverage foundational models instead of relying on manually labeled data to build simulated environments. Specifically, we introduce a multi-agent brainstorming approach, where agents collaborate and generate diverse scenarios covering household hazards, hygiene management, and child safety. These textual task descriptions are then integrated with designed 3D assets to simulate realistic environments. Within these constructed environments, our LLM-based robotic agent learns the necessary skills to proactively discover and handle the proposed anomalies through task decomposition, optimal learning approach selection. We demonstrate that our generated environment outperforms others in terms of task description and scene diversity, ultimately enabling robotic agents to better address potential household hazards.
A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions
Hongbin Na | Yining Hua | Zimu Wang | Tao Shen | Beibei Yu | Lilin Wang | Wei Wang | John Torous | Ling Chen
Findings of the Association for Computational Linguistics: ACL 2025
Hongbin Na | Yining Hua | Zimu Wang | Tao Shen | Beibei Yu | Lilin Wang | Wei Wang | John Torous | Ling Chen
Findings of the Association for Computational Linguistics: ACL 2025
Mental health is increasingly critical in contemporary healthcare, with psychotherapy demanding dynamic, context-sensitive interactions that traditional NLP methods struggle to capture. Large Language Models (LLMs) offer significant potential for addressing this gap due to their ability to handle extensive context and multi-turn reasoning. This review introduces a conceptual taxonomy dividing psychotherapy into interconnected stages–assessment, diagnosis, and treatment–to systematically examine LLM advancements and challenges. Our comprehensive analysis reveals imbalances in current research, such as a focus on common disorders, linguistic biases, fragmented methods, and limited theoretical integration. We identify critical challenges including capturing dynamic symptom fluctuations, overcoming linguistic and cultural biases, and ensuring diagnostic reliability. Highlighting future directions, we advocate for continuous multi-stage modeling, real-time adaptive systems grounded in psychological theory, and diversified research covering broader mental disorders and therapeutic approaches, aiming toward more holistic and clinically integrated psychotherapy LLMs systems.
Lost in Pronunciation: Detecting Chinese Offensive Language Disguised by Phonetic Cloaking Replacement
Haotan Guo | Jianfei He | Jiayuan Ma | Hongbin Na | Zimu Wang | Haiyang Zhang | Qi Chen | Wei Wang | Zijing Shi | Tao Shen | Ling Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Haotan Guo | Jianfei He | Jiayuan Ma | Hongbin Na | Zimu Wang | Haiyang Zhang | Qi Chen | Wei Wang | Zijing Shi | Tao Shen | Ling Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Phonetic Cloaking Replacement (PCR), defined as the deliberate use of homophonic or near-homophonic variants to hide toxic intent, has become a major obstacle to Chinese content moderation. While this problem is well-recognized, existing evaluations predominantly rely on rule-based, synthetic perturbations that ignore the creativity of real users. We organize PCR into a four-way surface-form taxonomy and compile PCR-ToxiCN, a dataset of 500 naturally occurring, phonetically cloaked offensive posts gathered from the RedNote platform. Benchmarking state-of-the-art LLMs on this dataset exposes a serious weakness: the best model reaches only an F1-score of 0.672, and zero-shot chain-of-thought prompting pushes performance even lower. Guided by error analysis, we revisit a Pinyin-based prompting strategy that earlier studies judged ineffective and show that it recovers much of the lost accuracy. This study offers the first comprehensive taxonomy of Chinese PCR, a realistic benchmark that reveals current detectors’ limits, and a lightweight mitigation technique that advances research on robust toxicity detection.
Detecting Conversational Mental Manipulation with Intent-Aware Prompting
Jiayuan Ma | Hongbin Na | Zimu Wang | Yining Hua | Yue Liu | Wei Wang | Ling Chen
Proceedings of the 31st International Conference on Computational Linguistics
Jiayuan Ma | Hongbin Na | Zimu Wang | Yining Hua | Yue Liu | Wei Wang | Ling Chen
Proceedings of the 31st International Conference on Computational Linguistics
Mental manipulation severely undermines mental wellness by covertly and negatively distorting decision-making. While there is an increasing interest in mental health care within the natural language processing community, progress in tackling manipulation remains limited due to the complexity of detecting subtle, covert tactics in conversations. In this paper, we propose Intent-Aware Prompting (IAP), a novel approach for detecting mental manipulations using large language models (LLMs), providing a deeper understanding of manipulative tactics by capturing the underlying intents of participants. Experimental results on the MentalManip dataset demonstrate superior effectiveness of IAP against other advanced prompting strategies. Notably, our approach substantially reduces false negatives, helping detect more instances of mental manipulation with minimal misjudgment of positive cases. The code of this paper is available at https://github.com/Anton-Jiayuan-MA/Manip-IAP.
From Posts to Timelines: Modeling Mental Health Dynamics from Social Media Timelines with Hybrid LLMs
Zimu Wang | Hongbin Na | Rena Gao | Jiayuan Ma | Yining Hua | Ling Chen | Wei Wang
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)
Zimu Wang | Hongbin Na | Rena Gao | Jiayuan Ma | Yining Hua | Ling Chen | Wei Wang
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)
Social media data is recognized for its usefulness in the early detection of mental disorders; however, there is a lack of research focused on modeling individuals’ longitudinal mental health dynamics. Moreover, fine-tuning large language models (LLMs) on large-scale, annotated datasets presents challenges due to privacy concerns and the difficulties on data collection and annotation. In this paper, we propose a novel approach for modeling mental health dynamics using hybrid LLMs, where we first apply both classification-based and generation-based models to identify adaptive and maladaptive evidence from individual posts. This evidence is then used to predict well-being scores and generate post-level and timeline-level summaries. Experimental results on the CLPsych 2025 shared task demonstrate the effectiveness of our method, with the generative-based model showing a marked advantage in evidence identification.
Thinker-DDM: Modeling Deliberation for Machine Translation with a Drift-Diffusion Process
Hongbin Na | Zimu Wang | Mieradilijiang Maimaiti | Tong Chen | Wei Wang | Tao Shen | Ling Chen
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Hongbin Na | Zimu Wang | Mieradilijiang Maimaiti | Tong Chen | Wei Wang | Tao Shen | Ling Chen
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Large language models (LLMs) have demonstrated promising potential in various downstream tasks, including machine translation. However, prior work on LLM-based machine translation has mainly focused on better utilizing training data, demonstrations, or pre-defined and universal knowledge to improve performance, with a lack of consideration of decision-making like human translators. In this paper, we incorporate Thinker with the Drift-Diffusion Model (Thinker-DDM) to address this issue. We then redefine the Drift-Diffusion process to emulate human translators’ dynamic decision-making under constrained resources. We conduct extensive experiments under the high-resource, low-resource, and commonsense translation settings using the WMT22 and CommonMT datasets, in which Thinker-DDM outperforms baselines in the first two scenarios. We also perform additional analysis and evaluation on commonsense translation to illustrate the high effectiveness and efficacy of the proposed method.
2024
CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering
Hongbin Na
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Hongbin Na
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The recent advancements in artificial intelligence highlight the potential of language models in psychological health support. While models trained on data from mental health service platform have achieved preliminary success, challenges persist in areas such as data scarcity, quality, and ensuring a solid foundation in psychological techniques. To address these challenges, this study introduces a novel approach to enhance the precision and efficacy of psychological support through large language models. Specifically, we design a specific prompt derived from principles of Cognitive Behavioral Therapy (CBT) and have generated the CBT QA dataset, specifically for Chinese psychological health Q&A based on CBT structured intervention strategies. Unlike previous methods, our dataset emphasizes professional and structured response. Utilizing this dataset, we fine-tuned the large language model, giving birth to CBT-LLM, the large-scale language model specifically designed for Cognitive Behavioral Therapy techniques. Empirical evaluations demonstrate that CBT-LLM excels in generating structured, professional, and highly relevant responses in psychological health support tasks, showcasing its practicality and quality. The model is available on Hugging Face: https://huggingface.co/Hongbin37/CBT-LLM.
Search
Fix author
Co-authors
- Ling Chen 8
- Zimu Wang 8
- Yining Hua 5
- Wei Wang 5
- Tao Shen 4
- Jiayuan Ma 3
- John Torous 3
- Wei Wang 3
- Zhaoming Chen 2
- Rena Wei Gao 2
- Haotan Guo 2
- Shaoxiong Ji 2
- Zijing Shi 2
- Haiyang Zhang 2
- Sophia Ananiadou 1
- Qi Chen 1
- Tong Chen 1
- Xiuying Chen 1
- Zhenhao Chen 1
- Meng Fang 1
- Miao Fang 1
- Jianfei He 1
- Shiyu Jiang 1
- Yue Liu 1
- Mieradilijiang Maimaiti 1
- Guangxian Ouyang 1
- Zirui Song 1
- Lilin Wang 1
- Kailai Yang 1
- Beibei Yu 1
- Fu Yujie 1
- Zeyu Zhang 1
- Grace Ziqi Zhou 1
- Peilin Zhou 1