National Library as Corpus: DeLiKo-2025@DNB – a Very Large Corpus of German-language Contemporary Literature

Marc Kupietz, Nils Diewald, Philippe Genêt, Andreas Witt


Abstract
This paper introduces DeLiKo-2025@DNB, a very large, linguistically annotated corpus of German-language contemporary literature, freely accessible via https://korap.dnb.de/. The corpus currently comprises 21 billion words from over 287,000 books published between 2005 and the present, spanning pulp and genre fiction as well as literary award-winning works. It covers the entire holdings of EPUB-format fiction ebooks deposited with the German National Library (DNB). We provide a detailed account of the corpus composition, metadata, and key features. Additionally, we explain our strategy for enabling lawful and effective access through the deployment of the open-source corpus analysis platform KorAP at the DNB, and we discuss both the transferability of our approach and work to other national libraries and our ongoing and planned extensions and enhancements.
Anthology ID:
2026.lrec-main.518
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
6528–6535
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.518/
DOI:
Bibkey:
Cite (ACL):
Marc Kupietz, Nils Diewald, Philippe Genêt, and Andreas Witt. 2026. National Library as Corpus: DeLiKo-2025@DNB – a Very Large Corpus of German-language Contemporary Literature. International Conference on Language Resources and Evaluation, main:6528–6535.
Cite (Informal):
National Library as Corpus: DeLiKo-2025@DNB – a Very Large Corpus of German-language Contemporary Literature (Kupietz et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.518.pdf