Florian Barth
2026
Text+: A National Hub Including Legacy Language Data
Florian Barth | Christoph Draxler | Jennifer Ecker | Stefan Fischer | Philippe Genêt | Alina Hemmer | Timm Lehmberg | Thorsten Trippel | Andreas Witt | Arden Zimmermann | Claus Zinn
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Florian Barth | Christoph Draxler | Jennifer Ecker | Stefan Fischer | Philippe Genêt | Alina Hemmer | Timm Lehmberg | Thorsten Trippel | Andreas Witt | Arden Zimmermann | Claus Zinn
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Text+ is the German distributed research data infrastructure for literary studies, linguistics, and spoken and written language. Its resources consist of contemporary and historical literary and media texts, deeply annotated material, transcripts of spoken and sign language, and original recordings. Text+ provides access to its resources according to the FAIR guidelines: Findable due to standard-conformant metadata, Accessible with single sign-on authentication, Interoperable via open data formats, and Reproducible through web services and extensive documentation. The 30+ partners of Text+ are archives, libraries, universities, and other research institutions. The partners are autonomous, and they differ in the amount of data and processing capabilities they provide. In this paper, we describe the hub architecture of Text+, which gives users a central and FAIR point of access to research data that continues to be distributed across the Text+ partner institutions. The architecture serves as a blueprint to evolving research infrastructures that aim at maintaining (and empowering) their research data contributors.
2022
MONAPipe: Modes of Narration and Attribution Pipeline for German Computational Literary Studies and Language Analysis in spaCy
Tillmann Dönicke | Florian Barth | Hanna Varachkina | Caroline Sporleder
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Tillmann Dönicke | Florian Barth | Hanna Varachkina | Caroline Sporleder
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Levels of Non-Fictionality in Fictional Texts
Florian Barth | Hanna Varachkina | Tillmann Dönicke | Luisa Gödeke
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
Florian Barth | Hanna Varachkina | Tillmann Dönicke | Luisa Gödeke
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
The annotation and automatic recognition of non-fictional discourse within a text is an important, yet unresolved task in literary research. While non-fictional passages can consist of several clauses or sentences, we argue that 1) an entity-level classification of fictionality and 2) the linking of Wikidata identifiers can be used to automatically identify (non-)fictional discourse. We query Wikidata and DBpedia for relevant information about a requested entity as well as the corresponding literary text to determine the entity’s fictionality status and assign a Wikidata identifier, if unequivocally possible. We evaluate our methods on an exemplary text from our diachronic literary corpus, where our methods classify 97% of persons and 62% of locations correctly as fictional or real. Furthermore, 75% of the resolved persons and 43% of the resolved locations are resolved correctly. In a quantitative experiment, we apply the entity-level fictionality tagger to our corpus and conclude that more non-fictional passages can be identified when information about real entities is available.