Ivan Nenchev


2025

We present a study investigating the linguistic sentiment associated with schizophrenia and depression in research-based texts. To this end, we construct a corpus of over 260,000 PubMed abstracts published between 1975 and 2025, covering both disorders. For sentiment analysis, we fine-tune two sentence-transformer models using SetFit with a training dataset consisting of sentences rated for valence by psychiatrists and clinical psychologists. Our analysis identifies significant temporal trends and differences between the two conditions. While the mean positive sentiment in abstracts and titles increases over time, a more detailed analysis reveals a marked rise in both maximum negative and maximum positive sentiment, suggesting a shift toward more polarized language. Notably, sentiment in abstracts on schizophrenia is significantly more negative overall. Furthermore, an exploratory analysis indicates that negative sentences are disproportionately concentrated at the beginning of abstracts. These findings suggest that linguistic style in scientific literature is evolving. We discuss the broader ethical and societal implications of these results and propose recommendations for more cautious language use in scientific discourse.
Large language models (LLMs) are increasingly being used to interpret and generate human language, yet their ability to process clinical language remains underexplored. This study examined whether three open-source LLMs can infer interviewer questions from participant responses in a semi-structured psychiatric interview (NET) conducted with individuals diagnosed with schizophrenia (n = 107) and neurotypical controls (n = 66). Using cosine similarity between LLM-generated questions and original prompts as a proxy for the precision of the inference, we found that responses from individuals with schizophrenia produced significantly lower similarity scores (beta = –0.165, p < .001). Cosine similarity decreased across the nested structure of the interview, with smaller reductions observed in the schizophrenia group. Although all emotions decreased similarity with fear, only sadness showed a significant interaction with diagnosis, suggesting differential processing of emotional discourse. Model type and generation temperature also influenced outcomes, highlighting variability in model performance. Our findings demonstrate that LLMs systematically struggle to reconstruct interviewer intent from responses by individuals with schizophrenia, reflecting known discourse-level disturbances in the disorder.

2024

We present a study of the linguistic output of the German-speaking writer Robert Walser using NLP. We curated a corpus comprising texts written by Walser during periods of sound health, and writings from the year before his hospitalization, and writings from the first year of his stay in a psychiatric clinic, all likely at- tributed to schizophrenia. Within this corpus, we identified and analyzed a total of 20 lin- guistic markers encompassing established met- rics for lexical diversity, semantic similarity, and syntactic complexity. Additionally, we ex- plored lesser-known markers such as lexical innovation, concreteness, and imageability. No- tably, we introduced two additional markers for phonological similarity for the first time within this context. Our findings reveal sig- nificant temporal dynamics in these markers closely associated with Walser’s contempora- neous diagnosis of schizophrenia. Furthermore, we investigated the relationship between these markers, leveraging them for classification of the schizophrenic episode.

2019

Incoherent discourse in schizophrenia has long been recognized as a dominant symptom of the mental disorder (Bleuler, 1911/1950). Recent studies have used modern sentence and word embeddings to compute coherence metrics for spontaneous speech in schizophrenia. While clinical ratings always have a subjective element, computational linguistic methodology allows quantification of speech abnormalities. Clinical and empirical knowledge from psychiatry provide the theoretical and conceptual basis for modelling. Our study is an interdisciplinary attempt at improving coherence models in schizophrenia. Speech samples were obtained from healthy controls and patients with a diagnosis of schizophrenia or schizoaffective disorder and different severity of positive formal thought disorder. Interviews were transcribed and coherence metrics derived from different embeddings. One model found higher coherence metrics for controls than patients. All other models remained non-significant. More detailed analysis of the data motivates different approaches to improving coherence models in schizophrenia, e.g. by assessing referential abnormalities.