Creation of Learner Corpus and Its Application to Speech Recognition
Hiroki Yamazaki, Keisuke Kitamura, Takashi Harada, Seiichi Yamamoto
Abstract
Some big languages like English are spoken by a lot of people whose mother tongues are different from. Their second languages often have not only distinct accent but also different lexical and syntactic characteristics. Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics in the training and recognition tasks differ. Language model of a speech recognition system is usually trained with transcribed speech data or text data collected in English native countries, therefore, speech recognition performance is expected to be degraded by mismatch of lexical and syntactic characteristics between native speakers and second language speakers as well as the distinction between their accents. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for mismatch of the lexical, syntactic, or semantic characteristics. This paper describes whether the language model adaptation is effective for compensating for the mismatch between the lexical, syntactic, or semantic characteristics of native speakers and second language speakers.- Anthology ID:
- L08-1520
- Volume:
- Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
- Month:
- May
- Year:
- 2008
- Address:
- Marrakech, Morocco
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/39_paper.pdf
- DOI:
- Cite (ACL):
- Hiroki Yamazaki, Keisuke Kitamura, Takashi Harada, and Seiichi Yamamoto. 2008. Creation of Learner Corpus and Its Application to Speech Recognition. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
- Cite (Informal):
- Creation of Learner Corpus and Its Application to Speech Recognition (Yamazaki et al., LREC 2008)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/39_paper.pdf