Moustafa Hassan

2026

The Divergence Hypothesis: Unmasking Lexical Interference and Label Bias in Mental Health NLP
Moustafa Hassan
BioNLP 2026

Computational mental health (CMH) classifiers often degrade under distribution shift because human annotators and distant-supervision pipelines reward different linguistic signals. We introduce TSS (Triple-Stream Stress probe), a multi-channel diagnostic framework that decomposes text into (A) lexical character n-grams, (B) a small, mostly content-free morpho-syntactic channel, and (C) a 154-feature psycholinguistic style channel. Across four English datasets (N = 12,906), TSS reveals a lexical interference effect: adding lexical features to the style channel reduces Macro-F1 on human-labeled data (mean drop 0.072, p 10??) but not on auto-labeled data. We propose Degree of Divergence (DoD), a difference-in-differences statistic adapted from econometrics for label-source auditing, with instance-level bootstrap inference; the headline estimate is DoD(BC?A) = 0.0374, 95% CI [0.0097, 0.0651], p = 0.0032. A platform-stratified Twitter-only DoD (which removes the Reddit vs. Twitter contrast) reproduces the pattern with bootstrap inference: DoD??,BC?A = +0.096 (p 0.001) and DoD??,AC?A = ?0.089 (p 0.001). Interventional masking (pos_only) retains ?95?99% of Channel C’s performance after destroying content words on human datasets, indicating that the style channel does not rely primarily on lexical surface form. TSS is positioned as a diagnostic audit framework, not a clinical screening tool: it flags label-source-specific shortcut learning before generalization claims are made.

Co-authors

Venues

BioNLP1
WS1

Fix author