2024
pdf
abs
Cross-Lingual Examination of Language Features and Cognitive Scores From Free Speech
Hali Lindsay
|
Giorgia Albertin
|
Louisa Schwed
|
Nicklas Linz
|
Johannes Tröger
Proceedings of the Fifth Workshop on Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments @LREC-COLING 2024
Speech analysis is gaining significance for monitoring neurodegenerative disorders, but with a view of application in clinical practice, solid evidence of the association of language features with cognitive scores is still needed. A cross-linguistic investigation has been pursued to examine whether language features show significance correlation with two cognitive scores, i.e. Mini-Mental State Examination and ki:e SB-C scores, on Alzheimer’s Disease patients. We explore 23 language features, representative of syntactic complexity and semantic richness, extracted on a dataset of free speech recordings of 138 participants distributed in four languages (Spanish, Catalan, German, Dutch). Data was analyzed using the speech library SIGMA; Pearson’s correlation was computed with Bonferroni correction, and a mixed effects linear regression analysis is done on the significant correlated results. MMSE and the SB-C are found to be correlated with no significant differences across languages. Three features were found to be significantly correlated with the SB-C scores. Among these, two features of lexical richness show consistent patterns across languages, while determiner rate showed language-specific patterns.
2022
pdf
bib
abs
Generating Synthetic Clinical Speech Data through Simulated ASR Deletion Error
Hali Lindsay
|
Johannes Tröger
|
Mario Magued Mina
|
Philipp Müller
|
Nicklas Linz
|
Jan Alexandersson
|
Inez Ramakers
Proceedings of the RaPID Workshop - Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments - within the 13th Language Resources and Evaluation Conference
Training classification models on clinical speech is a time-saving and effective solution for many healthcare challenges, such as screening for Alzheimer’s Disease over the phone. One of the primary limiting factors of the success of artificial intelligence (AI) solutions is the amount of relevant data available. Clinical data is expensive to collect, not sufficient for large-scale machine learning or neural methods, and often not shareable between institutions due to data protection laws. With the increasing demand for AI in health systems, generating synthetic clinical data that maintains the nuance of underlying patient pathology is the next pressing task. Previous work has shown that automated evaluation of clinical speech tasks via automatic speech recognition (ASR) is comparable to manually annotated results in diagnostic scenarios even though ASR systems produce errors during the transcription process. In this work, we propose to generate synthetic clinical data by simulating ASR deletion errors on the transcript to produce additional data. We compare the synthetic data to the real data with traditional machine learning methods to test the feasibility of the proposed method. Using a dataset of 50 cognitively impaired and 50 control Dutch speakers, ten additional data points are synthetically generated for each subject, increasing the training size for 100 to 1000 training points. We find consistent and comparable performance of models trained on only synthetic data (AUC=0.77) to real data (AUC=0.77) in a variety of traditional machine learning scenarios. Additionally, linear models are not able to distinguish between real and synthetic data.
2021
pdf
abs
Multilingual Learning for Mild Cognitive Impairment Screening from a Clinical Speech Task
Hali Lindsay
|
Philipp Müller
|
Insa Kröger
|
Johannes Tröger
|
Nicklas Linz
|
Alexandra Konig
|
Radia Zeghari
|
Frans RJ Verhey
|
Inez HGB Ramakers
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
The Semantic Verbal Fluency Task (SVF) is an efficient and minimally invasive speech-based screening tool for Mild Cognitive Impairment (MCI). In the SVF, testees have to produce as many words for a given semantic category as possible within 60 seconds. State-of-the-art approaches for automatic evaluation of the SVF employ word embeddings to analyze semantic similarities in these word sequences. While these approaches have proven promising in a variety of test languages, the small amount of data available for any given language limits the performance. In this paper, we for the first time investigate multilingual learning approaches for MCI classification from the SVF in order to combat data scarcity. To allow for cross-language generalisation, these approaches either rely on translation to a shared language, or make use of several distinct word embeddings. In evaluations on a multilingual corpus of older French, Dutch, and German participants (Controls=66, MCI=66), we show that our multilingual approaches clearly improve over single-language baselines.
pdf
abs
Dissociating Semantic and Phonemic Search Strategies in the Phonemic Verbal Fluency Task in early Dementia
Hali Lindsay
|
Philipp Müller
|
Nicklas Linz
|
Radia Zeghari
|
Mario Magued Mina
|
Alexandra Konig
|
Johannes Tröger
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access
Effective management of dementia hinges on timely detection and precise diagnosis of the underlying cause of the syndrome at an early mild cognitive impairment (MCI) stage. Verbal fluency tasks are among the most often applied tests for early dementia detection due to their efficiency and ease of use. In these tasks, participants are asked to produce as many words as possible belonging to either a semantic category (SVF task) or a phonemic category (PVF task). Even though both SVF and PVF share neurocognitive function profiles, the PVF is typically believed to be less sensitive to measure MCI-related cognitive impairment and recent research on fine-grained automatic evaluation of VF tasks has mainly focused on the SVF. Contrary to this belief, we show that by applying state-of-the-art semantic and phonemic distance metrics in automatic analysis of PVF word productions, in-depth conclusions about production strategy of MCI patients are possible. Our results reveal a dissociation between semantically- and phonemically-guided search processes in the PVF. Specifically, we show that subjects with MCI rely less on semantic- and more on phonemic processes to guide their word production as compared to healthy controls (HC). We further show that semantic similarity-based features improve automatic MCI versus HC classification by 29% over previous approaches for the PVF. As such, these results point towards the yet underexplored utility of the PVF for in-depth assessment of cognition in MCI.
2019
pdf
abs
Multilingual prediction of Alzheimer’s disease through domain adaptation and concept-based language modelling
Kathleen C. Fraser
|
Nicklas Linz
|
Bai Li
|
Kristina Lundholm Fors
|
Frank Rudzicz
|
Alexandra König
|
Jan Alexandersson
|
Philippe Robert
|
Dimitrios Kokkinakis
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
There is growing evidence that changes in speech and language may be early markers of dementia, but much of the previous NLP work in this area has been limited by the size of the available datasets. Here, we compare several methods of domain adaptation to augment a small French dataset of picture descriptions (n = 57) with a much larger English dataset (n = 550), for the task of automatically distinguishing participants with dementia from controls. The first challenge is to identify a set of features that transfer across languages; in addition to previously used features based on information units, we introduce a new set of features to model the order in which information units are produced by dementia patients and controls. These concept-based language model features improve classification performance in both English and French separately, and the best result (AUC = 0.89) is achieved using the multilingual training set with a combination of information and language model features.
pdf
abs
The importance of sharing patient-generated clinical speech and language data
Kathleen C. Fraser
|
Nicklas Linz
|
Hali Lindsay
|
Alexandra König
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
Increased access to large datasets has driven progress in NLP. However, most computational studies of clinically-validated, patient-generated speech and language involve very few datapoints, as such data are difficult (and expensive) to collect. In this position paper, we argue that we must find ways to promote data sharing across research groups, in order to build datasets of a more appropriate size for NLP and machine learning analysis. We review the benefits and challenges of sharing clinical language data, and suggest several concrete actions by both clinical and NLP researchers to encourage multi-site and multi-disciplinary data sharing. We also propose the creation of a collaborative data sharing platform, to allow NLP researchers to take a more active responsibility for data transcription, annotation, and curation.
pdf
abs
Temporal Analysis of the Semantic Verbal Fluency Task in Persons with Subjective and Mild Cognitive Impairment
Nicklas Linz
|
Kristina Lundholm Fors
|
Hali Lindsay
|
Marie Eckerström
|
Jan Alexandersson
|
Dimitrios Kokkinakis
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
The Semantic Verbal Fluency (SVF) task is a classical neuropsychological assessment where persons are asked to produce words belonging to a semantic category (e.g., animals) in a given time. This paper introduces a novel method of temporal analysis for SVF tasks utilizing time intervals and applies it to a corpus of elderly Swedish subjects (mild cognitive impairment, subjective cognitive impairment and healthy controls). A general decline in word count and lexical frequency over the course of the task is revealed, as well as an increase in word transition times. Persons with subjective cognitive impairment had a higher word count during the last intervals, but produced words of the same lexical frequencies. Persons with MCI had a steeper decline in both word count and lexical frequencies during the third interval. Additional correlations with neuropsychological scores suggest these findings are linked to a person’s overall vocabulary size and processing speed, respectively. Classification results improved when adding the novel features (AUC=0.72), supporting their diagnostic value.
pdf
Automatic Data-Driven Approaches for Evaluating the Phonemic Verbal Fluency Task with Healthy Adults
Hali Lindsay
|
Nicklas Linz
|
Johannes Troeger
|
Jan Alexandersson
Proceedings of the 3rd International Conference on Natural Language and Speech Processing
2018
pdf
The Metalogue Debate Trainee Corpus: Data Collection and Annotations
Volha Petukhova
|
Andrei Malchanau
|
Youssef Oualil
|
Dietrich Klakow
|
Saturnino Luz
|
Fasih Haider
|
Nick Campbell
|
Dimitris Koryzis
|
Dimitris Spiliotopoulos
|
Pierre Albert
|
Nicklas Linz
|
Jan Alexandersson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
Using Neural Word Embeddings in the Analysis of the Clinical Semantic Verbal Fluency Task
Nicklas Linz
|
Johannes Tröger
|
Jan Alexandersson
|
Alexandra König
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers