ADHD-Lang: A Large-Scale Social Media Dataset for Verbal Behavior and Digital Phenotyping in Adult ADHD

Daniel Wiechmann, Elma Kerz, Edward Kempa, Yu Qiao


Abstract
We introduce ADHD-Lang, a large-scale language resource derived from Reddit to advance computational phenotyping of adult ADHD. The corpus is constructed using a high-precision self-disclosure pattern to confirm ADHD diagnoses and a matched control cohort, comprising 12,070 ADHD users (317,073 posts; 2.83M sentences) and 12,070 controls (174,765 posts; 1.27M sentences). In releasing ADHD-Lang to the research community, we also provide the first comprehensive baseline results, systematically examining the accuracy–transparency trade-off across three model families: (1) interpretable shallow machine learning models trained on clinically meaningful, expert-engineered language biomarkers; (2) a deep BiLSTM network trained on the same feature representations to capture temporal dynamics across users’ posts; and (3) black-box transformer-based models (BERT, RoBERTa, MentalRoBERTa) leveraging contextual embeddings—non-interpretable, high-dimensional representations. ADHD-Lang is released as a standardized benchmark to promote reproducible research and accelerate progress toward digital verbal-behavior phenotyping for adult ADHD.
Anthology ID:
2026.lrec-main.577
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7279–7291
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.577/
DOI:
Bibkey:
Cite (ACL):
Daniel Wiechmann, Elma Kerz, Edward Kempa, and Yu Qiao. 2026. ADHD-Lang: A Large-Scale Social Media Dataset for Verbal Behavior and Digital Phenotyping in Adult ADHD. International Conference on Language Resources and Evaluation, main:7279–7291.
Cite (Informal):
ADHD-Lang: A Large-Scale Social Media Dataset for Verbal Behavior and Digital Phenotyping in Adult ADHD (Wiechmann et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.577.pdf