Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

Yunkai Xu; Saeed Abdullah

Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

Abstract

AI and large language models (LLMs) have emerged as promising tools to address global mental health challenges. Despite the global nature of these challenges, there remains a critical shortage of high-quality datasets for training and evaluating such systems. To mitigate this gap, researchers increasingly generate synthetic clinical personas to simulate user data and test digital mental health support systems. However, most validated personas rely on English-centric contexts. This paper investigates whether similar persona-based methods can be used to generate multilingual mental health datasets. We modified nationality and language parameters in personas to generate clinical dialogues in Mandarin, Bengali, and Hindi. We then examined how different LLMs perform when evaluating the depression severity of these generated multilingual datasets against the baseline in English. Our findings indicate that just adding nationality and language parameters in personas might not be adequate, as it can introduce clinical inconsistency across languages. LLM judge models often exhibit inaccuracies in assessing depression severity in non-English texts, with performance varying across different models. This exposes the systemic limitations of applying English-centric personas to multilingual contexts. Ultimately, our work highlights the urgent need for culturally responsive data generation to ensure equitable mental health systems globally.

Anthology ID:: 2026.clpsych-1.11
Volume:: Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Aya Zirikly, Kfir Bar, Sean MacAvaney, Molly Ireland, Yaakov Ophir, Dana Atzil-Slonim, Vasudha Varadarajan, Steven Bedrick, Bart Desmet
Venues:: CLPsych | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 138–152
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.clpsych-1.11/
DOI:
Bibkey:
Cite (ACL):: Yunkai Xu and Saeed Abdullah. 2026. Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language. In Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026), pages 138–152, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language (Xu & Abdullah, CLPsych 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.clpsych-1.11.pdf

PDF Cite Search Fix data