Collection and Preprocessing of Czech Sign Language Corpus for Sign Language Recognition

Pavel Campr, Marek Hrúz, Jana Trojanová


Abstract
This paper discusses the design, recording and preprocessing of a Czech sign language corpus. The corpus is intended for training and testing of sign language recognition (SLR) systems. The UWB-07-SLR-P corpus contains video data of 4 signers recorded from 3 different perspectives. Two of the perspectives contain whole body and provide 3D motion data, the third one is focused on signer’s face and provide data for face expression and lip feature extraction. Each signer performed 378 signs with 5 repetitions. The corpus consists of several types of signs: numbers (35 signs), one and two-handed finger alphabet (64), town names (35) and other signs (244). Each sign is stored in a separate AVI file. In total the corpus consists of 21853 video files in total length of 11.1 hours. Additionally each sign is preprocessed and basic features such as 3D hand and head trajectories are available. The corpus is mainly focused on feature extraction and isolated SLR rather than continuous SLR experiments.
Anthology ID:
L08-1471
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/804_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Pavel Campr, Marek Hrúz, and Jana Trojanová. 2008. Collection and Preprocessing of Czech Sign Language Corpus for Sign Language Recognition. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Collection and Preprocessing of Czech Sign Language Corpus for Sign Language Recognition (Campr et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/804_paper.pdf