The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research

Marc Kupietz, Cyril Belica, Holger Keibel, Andreas Witt

[How to correct problems with metadata yourself]


Abstract
This paper describes DeReKo (Deutsches Referenzkorpus), the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS) in Mannheim, and the rationale behind its development. We discuss its design, its legal background, how to access it, available metadata, linguistic annotation layers, underlying standards, ongoing developments, and aspects of using the archive for empirical linguistic research. The focus of the paper is on the advantages of DeReKo's design as a primordial sample from which virtual corpora can be drawn for the specific purposes of individual studies. Both concepts, primordial sample and virtual corpus are explained and illustrated in detail. Furthermore, we describe in more detail how DeReKo deals with the fact that all its texts are subject to third parties' intellectual property rights, and how it deals with the issue of replicability, which is particularly challenging given DeReKo's dynamic growth and the possibility to construct from it an open number of virtual corpora.
Anthology ID:
L10-1285
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/414_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Marc Kupietz, Cyril Belica, Holger Keibel, and Andreas Witt. 2010. The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research (Kupietz et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/414_Paper.pdf