The OpenCourseWare Metadiscourse (OCWMD) Corpus

Ghada Alharbi, Thomas Hain


Abstract
This study describes a new corpus of over 60,000 hand-annotated metadiscourse acts from 106 OpenCourseWare lectures, from two different disciplines: Physics and Economics. Metadiscourse is a set of linguistic expressions that signal different functions in the discourse. This type of language is hypothesised to be helpful in finding a structure in unstructured text, such as lectures discourse. A brief summary is provided about the annotation scheme and labelling procedures, inter-annotator reliability statistics, overall distributional statistics, a description of auxiliary data that will be distributed with the corpus, and information relating to how to obtain the data. The results provide a deeper understanding of lecture structure and confirm the reliable coding of metadiscursive acts in academic lectures across different disciplines. The next stage of our research will be to build a classification model to automate the tagging process, instead of manual annotation, which take time and efforts. This is in addition to the use of these tags as indicators of the higher level structure of lecture discourse.
Anthology ID:
L16-1279
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1770–1776
Language:
URL:
https://aclanthology.org/L16-1279
DOI:
Bibkey:
Cite (ACL):
Ghada Alharbi and Thomas Hain. 2016. The OpenCourseWare Metadiscourse (OCWMD) Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1770–1776, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
The OpenCourseWare Metadiscourse (OCWMD) Corpus (Alharbi & Hain, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/L16-1279.pdf