Pavel Campr
2008
Design and Recording of Czech Audio-Visual Database with Impaired Conditions for Continuous Speech Recognition
Jana Trojanová
|
Marek Hrúz
|
Pavel Campr
|
Miloš Železný
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we discuss the design, acquisition and preprocessing of a Czech audio-visual speech corpus. The corpus is intended for training and testing of existing audio-visual speech recognition system. The name of the database is UWB-07-ICAVR, where ICAVR stands for Impaired Condition Audio Visual speech Recognition. The corpus consists of 10,000 utterances of continuous speech obtained from 50 speakers. The total length of the database is 25 hours. Each utterance is stored as a separate sentence. The corpus extends existing databases by covering condition of variable illumination. We acquired 50 speakers, where half of them were men and half of them were women. Recording was done by two cameras and two microphones. Database introduced in this paper can be used for testing of visual parameterization in audio-visual speech recognition (AVSR). Corpus can be easily split into training and testing part. Each speaker pronounced 200 sentences: first 50 were the same for all, the rest of them were different. Six types of illumination were covered. Session for one speaker can fit on one DVD disk. All files are accompanied by visual labels. Labels specify region of interest (mouth and area around them specified by bounding box). Actual pronunciation of each sentence is transcribed into the text file.
Collection and Preprocessing of Czech Sign Language Corpus for Sign Language Recognition
Pavel Campr
|
Marek Hrúz
|
Jana Trojanová
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper discusses the design, recording and preprocessing of a Czech sign language corpus. The corpus is intended for training and testing of sign language recognition (SLR) systems. The UWB-07-SLR-P corpus contains video data of 4 signers recorded from 3 different perspectives. Two of the perspectives contain whole body and provide 3D motion data, the third one is focused on signers face and provide data for face expression and lip feature extraction. Each signer performed 378 signs with 5 repetitions. The corpus consists of several types of signs: numbers (35 signs), one and two-handed finger alphabet (64), town names (35) and other signs (244). Each sign is stored in a separate AVI file. In total the corpus consists of 21853 video files in total length of 11.1 hours. Additionally each sign is preprocessed and basic features such as 3D hand and head trajectories are available. The corpus is mainly focused on feature extraction and isolated SLR rather than continuous SLR experiments.