The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing

Peter Bourgonje, Manfred Stede


Abstract
We present the Potsdam Commentary Corpus 2.2, a German corpus of news editorials annotated on several different levels. New in the 2.2 version of the corpus are two additional annotation layers for coherence relations following the Penn Discourse TreeBank framework. Specifically, we add relation senses to an already existing layer of discourse connectives and their arguments, and we introduce a new layer with additional coherence relation types, resulting in a German corpus that mirrors the PDTB. The aim of this is to increase usability of the corpus for the task of shallow discourse parsing. In this paper, we provide inter-annotator agreement figures for the new annotations and compare corpus statistics based on the new annotations to the equivalent statistics extracted from the PDTB.
Anthology ID:
2020.lrec-1.133
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1061–1066
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.133
DOI:
Bibkey:
Cite (ACL):
Peter Bourgonje and Manfred Stede. 2020. The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1061–1066, Marseille, France. European Language Resources Association.
Cite (Informal):
The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing (Bourgonje & Stede, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.lrec-1.133.pdf