Bikers Accessing the Web: The SmartWeb Motorbike Corpus

Moritz Kaiser, Hannes Mögele, Florian Schiel


Abstract
Three advanced German speech corpora have been collected during theGerman SmartWeb project. One of them, the SmartWeb MotorbikeCorpus (SMC) is described in this paper. As with all SmartWeb speech corpora SMC is designed for a dialogue system dealing with open domains. The corpus is recorded under the special circumstances of a motorbike ride and contains utterances of the driver related to information retrieval from various sources and different topics. Audio tracks show characteristic noise from the engine and surrounding traffic as well as drop outs caused by the transmission over Bluetooth and the UMTS mobile network. We discuss the problems of the technical setup and the fully automatic evocation of natural-spoken queries by means of dialogue-like sequences.
Anthology ID:
L06-1154
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/280_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Moritz Kaiser, Hannes Mögele, and Florian Schiel. 2006. Bikers Accessing the Web: The SmartWeb Motorbike Corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Bikers Accessing the Web: The SmartWeb Motorbike Corpus (Kaiser et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/280_pdf.pdf