2025
pdf
bib
abs
Exploring the Role of Mental Health Conversational Agents in Training Medical Students and Professionals: A Systematic Literature Review
Thushari Atapattu
|
Menasha Thilakaratne
|
Duc Nhan Do
|
Mahen Herath
|
Katrina E. Falkner
Findings of the Association for Computational Linguistics: ACL 2025
The integration of Artificial Intelligence (AI) into mental health education and training (MHET) has become a promising solution to meet the increasing demand for skilled mental health professionals. This systematic review analyses 38 studies on AI-powered conversational agents (CAs) in MHET, selected from a total of 1003 studies published between 2019 and 2024. Following the PRISMA protocol, we reviewed papers from computer science, medicine, and interdisciplinary databases, assessing key aspects such as technological approaches, data characteristics, application areas, and evaluation methodologies. Our findings reveal that AI-based approaches, including Large Language Models (LLMs), dominate the field, with training as the application area being the most prevalent. These technologies show promise in simulating therapeutic interactions but face challenges such as limited public datasets, lack of standardised evaluation frameworks, and difficulty in ensuring authentic emotional responses, along with gaps in ethical considerations and clinical efficacy. This review presents a comprehensive framework for understanding the role of CAs in MHET while providing valuable recommendations to guide future research.
2022
pdf
bib
abs
EmoMent: An Emotion Annotated Mental Health Corpus from Two South Asian Countries
Thushari Atapattu
|
Mahen Herath
|
Charitha Elvitigala
|
Piyanjali de Zoysa
|
Kasun Gunawardana
|
Menasha Thilakaratne
|
Kasun de Zoysa
|
Katrina Falkner
Proceedings of the 29th International Conference on Computational Linguistics
People often utilise online media (e.g., Facebook, Reddit) as a platform to express their psychological distress and seek support. State-of-the-art NLP techniques demonstrate strong potential to automatically detect mental health issues from text. Research suggests that mental health issues are reflected in emotions (e.g., sadness) indicated in a person’s choice of language. Therefore, we developed a novel emotion-annotated mental health corpus (EmoMent),consisting of 2802 Facebook posts (14845 sentences) extracted from two South Asian countries - Sri Lanka and India. Three clinical psychology postgraduates were involved in annotating these posts into eight categories, including ‘mental illness’ (e.g., depression) and emotions (e.g., ‘sadness’, ‘anger’). EmoMent corpus achieved ‘very good’ inter-annotator agreement of 98.3% (i.e. % with two or more agreement) and Fleiss’ Kappa of 0.82. Our RoBERTa based models achieved an F1 score of 0.76 and a macro-averaged F1 score of 0.77 for the first task (i.e. predicting a mental health condition from a post) and the second task (i.e. extent of association of relevant posts with the categories defined in our taxonomy), respectively.
2020
pdf
bib
abs
Automated Detection of Cyberbullying Against Women and Immigrants and Cross-domain Adaptability
Thushari Atapattu
|
Mahen Herath
|
Georgia Zhang
|
Katrina Falkner
Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association
Cyberbullying is a prevalent and growing social problem due to the surge of social media technology usage. Minorities, women, and adolescents are among the common victims of cyberbullying. Despite the advancement of NLP technologies, the automated cyberbullying detection remains challenging. This paper focuses on advancing the technology using state-of-the-art NLP techniques. We use a Twitter dataset from SemEval 2019 - Task 5 (HatEval) on hate speech against women and immigrants. Our best performing ensemble model based on DistiBERT has achieved 0.73 and 0.74 of F1 score in the task of classifying hate speech (Task A) and aggressiveness and target (Task B) respectively. We adapt the ensemble model developed for Task A to classify offensive language in external datasets and achieved ~0.7 of F1 score using three benchmark datasets, enabling promising results for cross-domain adaptability. We conduct a qualitative analysis of misclassified tweets to provide insightful recommendations for future cyberbullying research.
pdf
bib
abs
Enhancing the Identification of Cyberbullying through Participant Roles
Gathika Rathnayake
|
Thushari Atapattu
|
Mahen Herath
|
Georgia Zhang
|
Katrina Falkner
Proceedings of the Fourth Workshop on Online Abuse and Harms
Cyberbullying is a prevalent social problem that inflicts detrimental consequences to the health and safety of victims such as psychological distress, anti-social behaviour, and suicide. The automation of cyberbullying detection is a recent but widely researched problem, with current research having a strong focus on a binary classification of bullying versus non-bullying. This paper proposes a novel approach to enhancing cyberbullying detection through role modeling. We utilise a dataset from ASKfm to perform multi-class classification to detect participant roles (e.g. victim, harasser). Our preliminary results demonstrate promising performance including 0.83 and 0.76 of F1-score for cyberbullying and role classification respectively, outperforming baselines.
pdf
bib
abs
AdelaideCyC at SemEval-2020 Task 12: Ensemble of Classifiers for Offensive Language Detection in Social Media
Mahen Herath
|
Thushari Atapattu
|
Hoang Anh Dung
|
Christoph Treude
|
Katrina Falkner
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes the systems our team (AdelaideCyC) has developed for SemEval Task 12 (OffensEval 2020) to detect offensive language in social media. The challenge focuses on three subtasks – offensive language identification (subtask A), offense type identification (subtask B), and offense target identification (subtask C). Our team has participated in all the three subtasks. We have developed machine learning and deep learning-based ensembles of models. We have achieved F1-scores of 0.906, 0.552, and 0.623 in subtask A, B, and C respectively. While our performance scores are promising for subtask A, the results demonstrate that subtask B and C still remain challenging to classify.