An Annotated Corpus of Direct Speech

John Lee, Chak Yan Yeung


Abstract
We propose a scheme for annotating direct speech in literary texts, based on the Text Encoding Initiative (TEI) and the coreference annotation guidelines from the Message Understanding Conference (MUC). The scheme encodes the speakers and listeners of utterances in a text, as well as the quotative verbs that reports the utterances. We measure inter-annotator agreement on this annotation task. We then present statistics on a manually annotated corpus that consists of books from the New Testament. Finally, we visualize the corpus as a conversational network.
Anthology ID:
L16-1168
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1059–1063
Language:
URL:
https://aclanthology.org/L16-1168
DOI:
Bibkey:
Cite (ACL):
John Lee and Chak Yan Yeung. 2016. An Annotated Corpus of Direct Speech. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1059–1063, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
An Annotated Corpus of Direct Speech (Lee & Yeung, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/L16-1168.pdf