Qing Zhao

2025

Cognitive distortion is a critical issue in psychology, with most existing studies based on Burns’ cognitive distortion theory. However, differences in annotation standards lead to variations in building analysis tools, resulting in inconsistent analyses and limiting the generalizability of findings, especially in large-scale and cross-linguistic contexts. To address this issue, we collected all publicly available datasets (four in total) and conducted a series of experiments to evaluate the generalizability of various cross-linguistic models. The results indicate that models exhibit significant performance differences across datasets, highlighting the generalization problem. To mitigate this issue, we propose two solutions. First, we propose a multi-task learning model based on teacher student architecture solution, which demonstrates improved generalization performance in our experiments. Second, we introduce a new dataset (~5,000 samples) derived from reannotating existing open datasets to ensure standardized alignment. The annotation process we provided is interpretable and grounded in psychological principles. Based on this, we constructed large language models with cognitive reasoning chains, enhancing both generalizability and interpretability. This study identifies the generalization challenge in cognitive distortion research, and our experiments show that the proposed solutions significantly improve model performance. The dataset and code are publicly available at: https://github.com/HongzhiQ/CrossLinCD.

2024

In the current environment, psychological issues are prevalent and widespread, with social media serving as a key outlet for individuals to share their feelings. This results in the generation of vast quantities of data daily, where negative emotions have the potential to precipitate crisis situations. There is a recognized need for models capable of efficient analysis. While pre-trained language models have demonstrated their effectiveness broadly, there’s a noticeable gap in pre-trained models tailored for specialized domains like psychology. To address this, we have collected a huge dataset from Chinese social media platforms and enriched it with publicly available datasets to create a comprehensive database encompassing 3.36 million text entries. To enhance the model’s applicability to psychological text analysis, we integrated psychological lexicons into the pre-training masking mechanism. Building on an existing Chinese language model, we performed adaptive training to develop a model specialized for the psychological domain. We evaluated our model’s performance across six public datasets, where it demonstrated improvements compared to eight other models. Additionally, in the qualitative comparison experiment, our model provided psychologically relevant predictions given the masked sentences. Due to concerns regarding data privacy, the dataset will not be made publicly available. However, we have made the pre-trained models and codes publicly accessible to the community via: https://github.com/zwzzzQAQ/Chinese-MentalBERT.

Co-authors

Qi Gao 1

Venues

findings2

Fix author