Abstract
Objective: A thematic and topic modelling analysis of sleep concerns in a social media derived, privacy-preserving, suicidality dataset. This forms the basis for an exploration of sleep as a potential computational linguistic signal in suicide prevention. Background: Suicidal ideation is a limited signal for suicide. Developments in computational linguistics and mental health datasets afford an opportunity to investigate additional signals and to consider the broader clinical ethical design implications. Methodology: A clinician-led integration of reflexive thematic analysis, with machine learning topic modelling (Bertopic), and the purposeful sampling of the University of Maryland Suicidality Dataset. Results: Sleep as a place of refuge and escape, revitalisation for exhaustion, and risk and vulnerability were generated as core themes in an initial thematic analysis of 546 posts. Bertopic analysing 21,876 sleep references in 16791 posts facilitated the production of 40 topics that were clinically interpretable, relevant, and thematically aligned to a level that exceeded original expectations. Privacy and synthetic representative data, reproducibility, validity and stochastic variability of results, and a multi-signal formulation perspective, are highlighted as key research and clinical issues.