WAPUSK20 - A Database for Robust Audiovisual Speech Recognition

Alexander Vorwerk, Xiaohui Wang, Dorothea Kolossa, Steffen Zeiler, Reinhold Orglmeister


Abstract
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recognizers in noisy environments by incorporating features of the visual modality. In order to develop reliable AVSR systems, appropriate simultaneously recorded speech and video data is needed. In this paper, we will introduce a corpus (WAPUSK20) that consists of audiovisual data of 20 speakers uttering 100 sentences each with four channels of audio and a stereoscopic video. The latter is intended to support more accurate lip tracking and the development of stereo data based normalization techniques for greater robustness of the recognition results. The sentence design has been adopted from the GRID corpus that has been widely used for AVSR experiments. Recordings have been made under acoustically realistic conditions in a usual office room. Affordable hardware equipment has been used, such as a pre-calibrated stereo camera and standard PC components. The software written to create this corpus was designed in MATLAB with help of hardware specific software provided by the hardware manufacturers and freely available open source software.
Anthology ID:
L10-1366
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/533_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Alexander Vorwerk, Xiaohui Wang, Dorothea Kolossa, Steffen Zeiler, and Reinhold Orglmeister. 2010. WAPUSK20 - A Database for Robust Audiovisual Speech Recognition. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
WAPUSK20 - A Database for Robust Audiovisual Speech Recognition (Vorwerk et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/533_Paper.pdf