A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study
Salima Mdhaffar, Yannick Estève, Antoine Laurent, Nicolas Hernandez, Richard Dufour, Delphine Charlet, Geraldine Damnati, Solen Quiniou, Nathalie Camelin
Abstract
This corpus is part of the PASTEL (Performing Automated Speech Transcription for Enhancing Learning) project aiming to explore the potential of synchronous speech transcription and application in specific teaching situations. It includes 10 hours of different lectures, manually transcribed and segmented. The main interest of this corpus lies in its multimodal aspect: in addition to speech, the courses were filmed and the written presentation supports (slides) are made available. The dataset may then serve researches in multiple fields, from speech and language to image and video processing. The dataset will be freely available to the research community. In this paper, we first describe in details the annotation protocol, including a detailed analysis of the manually labeled data. Then, we propose some possible use cases of the corpus with baseline results. The use cases concern scientific fields from both speech and text processing, with language model adaptation, thematic segmentation and transcription to slide alignment.- Anthology ID:
- 2020.lrec-1.529
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4293–4301
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.529
- DOI:
- Cite (ACL):
- Salima Mdhaffar, Yannick Estève, Antoine Laurent, Nicolas Hernandez, Richard Dufour, Delphine Charlet, Geraldine Damnati, Solen Quiniou, and Nathalie Camelin. 2020. A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4293–4301, Marseille, France. European Language Resources Association.
- Cite (Informal):
- A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study (Mdhaffar et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.lrec-1.529.pdf