Elmar Nöth

Also published as: E. Nöth, Elmar Noth

2019

pdf abs
Automated Cross-language Intelligibility Analysis of Parkinson’s Disease Patients Using Speech Recognition Technologies
Nina Hosseini-Kivanani | Juan Camilo Vásquez-Correa | Manfred Stede | Elmar Nöth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Speech deficits are common symptoms amongParkinson’s Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently, progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD patients. In the present study, we plan to analyze the speech signals of PD patients and healthy control (HC) subjects in three different languages: German, Spanish, and Czech, with the aim to identify biomarkers to discriminate between PD patients and HC subjects and to evaluate the neurological state of the patients. Therefore, the main contribution of this study is the automatic classification of PD patients and HC subjects in different languages with focusing on phonation, articulation, and prosody. We will focus on an intelligibility analysis based on automatic speech recognition systems trained on these three languages. This is one of the first studies done that considers the evaluation of the speech of PD patients in different languages. The purpose of this research proposal is to build a model that can discriminate PD and HC subjects even when the language used for train and test is different.

2016

In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance. Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra-segmental audio features; prosodic features seem to be the most important ones.

2014

pdf abs
New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease
Juan Rafael Orozco-Arroyave | Julián David Arias-Londoño | Jesús Francisco Vargas-Bonilla | María Claudia González-Rátiva | Elmar Nöth
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Parkinsons disease (PD) is the second most prevalent neurodegenerative disorder after Alzheimer’s, affecting about 1% of the people older than 65 and about 89% of the people with PD develop different speech disorders. Different researchers are currently working in the analysis of speech of people with PD, including the study of different dimensions in speech such as phonation, articulation,prosody and intelligibility. The study of phonation and articulation has been addressed mainly considering sustained vowels; however, the analysis of prosody and intelligibility requires the inclusion of words, sentences and monologue. In this paper we present a new database with speech recordings of 50 patients with PD and their respective healthy controls, matched by age and gender. All of the participants are Spanish native speakers and the recordings were collected following a protocol that considers both technical requirements and several recommendations given by experts in linguistics, phoniatry and neurology. This corpus includes tasks such as sustained phonations of the vowels, diadochokinetic evaluation, 45 words, 10 sentences, a reading text and a monologue. The paper also includes results of the characterization of the Spanish vowels considering different measures used in other works to characterize different speech impairments.

pdf abs
Erlangen-CLP: A Large Annotated Corpus of Speech from Children with Cleft Lip and Palate
Tobias Bocklet | Andreas Maier | Korbinian Riedhammer | Ulrich Eysholdt | Elmar Nöth
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we describe Erlangen-CLP, a large speech database of children with Cleft Lip and Palate. More than 800 German children with CLP (most of them between 4 and 18 years old) and 380 age matched control speakers spoke the semi-standardized PLAKSS test that consists of words with all German phonemes in different positions. So far 250 CLP speakers were manually transcribed, 120 of these were analyzed by a speech therapist and 27 of them by four additional therapists. The tharapists marked 6 different processes/criteria like pharyngeal backing and hypernasality which typically occur in speech of people with CLP. We present detailed statistics about the the marked processes and the inter-rater agreement.

2010

pdf abs
FAU IISAH Corpus – A German Speech Database Consisting of Human-Machine and Human-Human Interaction Acquired by Close-Talking and Far-Distance Microphones
Werner Spiegl | Korbinian Riedhammer | Stefan Steidl | Elmar Nöth
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper the FAU IISAH corpus and its recording conditions are described: a new speech database consisting of human-machine and human-human interaction recordings. Beside close-talking microphones for the best possible audio quality of the recorded speech, far-distance microphones were used to acquire the interaction and communication. The recordings took place during a Wizard-of-Oz experiment in the intelligent, senior-adapted house (ISA-House). That is a living room with a speech controlled home assistance system for elderly people, based on a dialogue system, which is able to process spontaneous speech. During the studies in the ISA-House more than eight hours of interaction data were recorded including 3 hours and 27 minutes of spontaneous speech. The data were annotated in terms of human-human (off-talk) and human-machine (on-talk) interaction. The test persons used 2891 turns of off-talk and 2752 turns of on-talk including 1751 different words. Still in progress is the analysis under statistical and linguistical aspects.

2004

This paper deals with databases that combine different aspects: children's speech, emotional speech, human-robot communication, cross-linguistics, and read vs. spontaneous speech: in a Wizard-of-Oz scenario, German and English children had to instruct Sony's AIBO robot to fulfil specific tasks. In one experimental condition, strictly parallel for German and English, the AIBO behaved `disobedient' by following it's own script irrespective of the child's commands. By that, reactions of different children to the same sequence of AIBO's actions could be obtained. In addition, both the German and the English children were recorded reading texts. The data are transliterated orthographically; emotional user states and some other phenomena will be annotated. We report preliminary word recognition rates and classification results.

2000

pdf
Labeling of Prosodic Events in Slovenian Speech Database GOPOLIS
France Mihelič | Jerneja Gros | Elmar Nöth | Volker Warnke
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1996