The IFADV Corpus: a Free Dialog Video Corpus

Rob van Son, Wieneke Wesseling, Eric Sanders, Henk van den Heuvel


Abstract
Research into spoken language has become more visual over the years. Both fundamental and applied research have progressively included gestures, gaze, and facial expression. Corpora of multi-modal conversational speech are rare and frequently difficult to use due to privacy and copyright restrictions. A freely available annotated corpus is presented, gratis and libre, of high quality video recordings of face-to-face conversational speech. Annotations include orthography, POS tags, and automatically generated phonemes transcriptions and word boundaries. In addition, labeling of both simple conversational function and gaze direction has been a performed. Within the bounds of the law, everything has been done to remove copyright and use restrictions. Annotations have been processed to RDBMS tables that allow SQL queries and direct connections to statistical software. From our experiences we would like to advocate the formulation of “best practises” for both legal handling and database storage of recordings and annotations.
Anthology ID:
L08-1219
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/132_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Rob van Son, Wieneke Wesseling, Eric Sanders, and Henk van den Heuvel. 2008. The IFADV Corpus: a Free Dialog Video Corpus. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
The IFADV Corpus: a Free Dialog Video Corpus (van Son et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/132_paper.pdf