Abstract
The Potsdam Commentary Corpus is a collection of 175 German newspaper commentaries annotated on a variety of different layers. This paper introduces a new layer that covers the linguistic notion of information-structural topic (not to be confused with ‘topic’ as applied to documents in information retrieval). To our knowledge, this is the first larger topic-annotated resource for German (and one of the first for any language). We describe the annotation guidelines and the annotation process, and the results of an inter-annotator agreement study, which compare favourably to the related work. The annotated corpus is freely available for research.- Anthology ID:
- L16-1271
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1718–1723
- Language:
- URL:
- https://aclanthology.org/L16-1271
- DOI:
- Cite (ACL):
- Manfred Stede and Sara Mamprin. 2016. Information structure in the Potsdam Commentary Corpus: Topics. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1718–1723, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Information structure in the Potsdam Commentary Corpus: Topics (Stede & Mamprin, LREC 2016)
- PDF:
- https://preview.aclanthology.org/revert-3132-ingestion-checklist/L16-1271.pdf