Yurie Iribe


2026

Recent advances in dialogue systems have been remarkable; however, conversational breakdowns still occur, making it essential to develop appropriate repair strategies. Nevertheless, when a system breakdown actually occurs, it remains unclear how the system should perform the repair, and no corpus has been available to investigate this issue. To address this gap, we presented typical examples of system-induced dialogue breakdowns to crowd workers and collected their expected repair utterances toward the broken system. Each repair utterance was annotated with dialogue act tags, and we constructed a breakdown-repair corpus consisting of 3,990 utterances covering ten representative types of breakdowns. This corpus includes breakdown cases across diverse situations, allowing for the examination of various repair patterns. Furthermore, we also conducted a questionnaire on participants’ personal traits, creating a dataset that enables the investigation of repair strategies tailored to individual user characteristics. In this paper, we report an overview of the dataset and preliminary analysis results.

2022

There is a need for a simple method of detecting early signs of dementia which is not burdensome to patients, since early diagnosis and treatment can often slow the advance of the disease. Several studies have explored using only the acoustic and linguistic information of conversational speech as diagnostic material, with some success. To accelerate this research, we recorded natural conversations between 128 elderly people living in four different regions of Japan and interviewers, who also administered the Hasegawa’s Dementia Scale-Revised (HDS-R), a cognitive impairment test. Using our elderly speech corpus and dementia test results, we propose an SVM-based screening method which can detect dementia using the acoustic features of conversational speech even when regional dialects are present. We accomplish this by omitting some acoustic features, to limit the negative effect of differences between dialects. When using our proposed method, a dementia detection accuracy rate of about 91% was achieved for speakers from two regions. When speech from four regions was used in a second experiment, the discrimination rate fell to 76.6%, but this may have been due to using only sentence-level acoustic features in the second experiment, instead of sentence and phoneme-level features as in the previous experiment. This is an on-going research project, and additional investigation is needed to understand differences in the acoustic characteristics of phoneme units in the conversational speech collected from these four regions, to determine whether the removal of formants and other features can improve the dementia detection rate.

2020

In an aging society like Japan, a highly accurate speech recognition system is needed for use in electronic devices for the elderly, but this level of accuracy cannot be obtained using conventional speech recognition systems due to the unique features of the speech of elderly people. S-JNAS, a corpus of elderly Japanese speech, is widely used for acoustic modeling in Japan, but the average age of its speakers is 67.6 years old. Since average life expectancy in Japan is now 84.2 years, we are constructing a new speech corpus, which currently consists of the utterances of 221 speakers with an average age of 79.2, collected from four regions of Japan. In addition, we expand on our previous study (Fukuda, 2019) by further investigating the construction of acoustic models suitable for elderly speech. We create new acoustic models and train them using a combination of existing Japanese speech corpora (JNAS, S-JNAS, CSJ), with and without our ‘super-elderly’ speech data, and conduct speech recognition experiments. Our new acoustic models achieve word error rates (WER) as low as 13.38%, exceeding the results of our previous study in which we used the CSJ acoustic model adapted for elderly speech (17.4% WER).

2016

We have constructed a new speech data corpus, using the utterances of 100 elderly Japanese people, to improve speech recognition accuracy of the speech of older people. Humanoid robots are being developed for use in elder care nursing homes. Interaction with such robots is expected to help maintain the cognitive abilities of nursing home residents, as well as providing them with companionship. In order for these robots to interact with elderly people through spoken dialogue, a high performance speech recognition system for speech of elderly people is needed. To develop such a system, we recorded speech uttered by 100 elderly Japanese, most of them are living in nursing homes, with an average age of 77.2. Previously, a seniors’ speech corpus named S-JNAS was developed, but the average age of the participants was 67.6 years, but the target age for nursing home care is around 75 years old, much higher than that of the S-JNAS samples. In this paper we compare our new corpus with an existing Japanese read speech corpus, JNAS, which consists of adult speech, and with the above mentioned S-JNAS, the senior version of JNAS.