We investigate linguistic markers associated with schizophrenia in clinical conversations by detecting predictive features among French-speaking patients. Dealing with human-human dialogues makes for a realistic situation, but it calls for strategies to represent the context and face data sparsity. We compare different approaches for data representation – from individual speech turns to entire conversations –, and data modeling, using lexical, morphological, syntactic, and discourse features, dimensions presumed to be tightly connected to the language of schizophrenia. Previous English models were mostly lexical and reached high performance, here replicated (93.7% acc.). However, our analysis reveals that these models are heavily biased, which probably concerns most datasets on this task. Our new delexicalized models are more general and robust, with the best accuracy score at 77.9%.
Nous présentons des expériences visant à identifier automatiquement des patients présentant des symptômes de schizophrénie dans des conversations contrôlées entre patients et psychothérapeutes. Nous fusionnons l’ensemble des tours de parole de chaque interlocuteur et entraînons des modèles de classification utilisant des informations lexicales, morphologiques et syntaxiques. Cette étude est la première du genre sur le français et obtient des résultats comparables à celles sur l’anglais. Nos premières expériences tendent à montrer que la parole des personnes avec schizophrénie se distingue de celle des témoins : le meilleur modèle obtient une exactitude de 93,66%. Des informations plus riches seront cependant nécessaires pour parvenir à un modèle robuste.
The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in (Lupkowski and Ginzburg, 2016). This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modelling of all the postulated classes. Our aim is to extend this work to cover all responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE and Maptask with include 506, 262, and 467 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (205 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialogue management.