Moustafa Hassan
2026
The Divergence Hypothesis: Unmasking Lexical Interference and Label Bias in Mental Health NLP
Moustafa Hassan
BioNLP 2026
Moustafa Hassan
BioNLP 2026
Computational mental health (CMH) classifiers often degrade under distribution shift because human annotators and distant-supervision pipelines reward different linguistic signals. We introduce TSS (Triple-Stream Stress probe), a multi-channel diagnostic framework that decomposes text into (A) lexical character n-grams, (B) a small, mostly content-free morpho-syntactic channel, and (C) a 154-feature psycholinguistic style channel. Across four English datasets (N = 12,906), TSS reveals a lexical interference effect: adding lexical features to the style channel reduces Macro-F1 on human-labeled data (mean drop 0.072, p 10??) but not on auto-labeled data. We propose Degree of Divergence (DoD), a difference-in-differences statistic adapted from econometrics for label-source auditing, with instance-level bootstrap inference; the headline estimate is DoD(BC?A) = 0.0374, 95% CI [0.0097, 0.0651], p = 0.0032. A platform-stratified Twitter-only DoD (which removes the Reddit vs. Twitter contrast) reproduces the pattern with bootstrap inference: DoD??,BC?A = +0.096 (p 0.001) and DoD??,AC?A = ?0.089 (p 0.001). Interventional masking (pos_only) retains ?95?99% of Channel C’s performance after destroying content words on human datasets, indicating that the style channel does not rely primarily on lexical surface form. TSS is positioned as a diagnostic audit framework, not a clinical screening tool: it flags label-source-specific shortcut learning before generalization claims are made.