Abstract
Speech technology applications, such as speech recognition, speech synthesis, and speech dialog systems, often require corpora based on highly customized specifications. Existing corpora available to the community, such as TIMIT and other corpora distributed by LDC and ELDA, do not always meet the requirements of such applications. In such cases, the developers need to create their own corpora. The creation of a highly customized speech corpus, however, could be a very expensive and time-consuming task, especially for small organizations. It requires multidisciplinary expertise in linguistics, management and engineering as it involves subtasks such as the corpus design, human subject recruitment, recording, quality assurance, and in some cases, segmentation, transcription and annotation. This paper describes LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community.- Anthology ID:
- L06-1484
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/776_pdf.pdf
- DOI:
- Cite (ACL):
- Kazuaki Maeda, Christopher Cieri, and Kevin Walker. 2006. Low-cost Customized Speech Corpus Creation for Speech Technology Applications. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- Low-cost Customized Speech Corpus Creation for Speech Technology Applications (Maeda et al., LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/776_pdf.pdf