Transcription Cost Reduction for Constructing Acoustic Models Using Acoustic Likelihood Selection Criteria

Tomoyuki Kato, Tomiki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

[How to correct problems with metadata yourself]


Abstract
This paper describes a novel method for reducing the transcription effort in the construction of task-adapted acoustic models for a practical automatic speech recognition (ASR) system. We have to prepare actual data samples collected in the practical system and transcribe them for training the task-adapted acoustic models. However, transcribing utterances is a time-consuming and laborious process. In the proposed method, we firstly adapt initial models to acoustic environment of the system using a small number of collected data samples with transcriptions. And then, we automatically select informative training data samples to be transcribed from a large-sized speech corpus based on acoustic likelihoods of the models. We perform several experimental evaluations in the framework of “Takemarukun”, a practical speech-oriented guidance system. Experimental results show that 1) utterance sets with low likelihoods cause better task-adapted models compared with those with high likelihoods although the set with the lowest likelihoods causes the performance degradation because of including outliers, and 2) MLLR adaptation is effective for training the task-adapted models when the amount of the transcribed data is small and EM training outperforms MLLR if we transcribe more than around 10,000 utterances.
Anthology ID:
L06-1200
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/344_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Tomoyuki Kato, Tomiki Toda, Hiroshi Saruwatari, and Kiyohiro Shikano. 2006. Transcription Cost Reduction for Constructing Acoustic Models Using Acoustic Likelihood Selection Criteria. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Transcription Cost Reduction for Constructing Acoustic Models Using Acoustic Likelihood Selection Criteria (Kato et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/344_pdf.pdf