Franz Matthies

2016

pdf abs
UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
Udo Hahn | Franz Matthies | Erik Faessler | Johannes Hellrich
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language & Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an underlying source code versioning system and various means to support collaboration for software development and code modification management. In order to automate the builds of complex NLP pipelines and properly represent and track dependencies of the underlying Java code, we incorporated Maven as part of our software configuration management efforts. In the meantime, we have deployed our artifacts on Maven Central, as well. JCoRe 2.0 offers a broad range of text analytics functionality (mostly) for English-language scientific abstracts and full-text articles, especially from the life sciences domain.

pdf abs
CodE Alltag: A German-Language E-Mail Corpus
Ulrike Krieg-Holz | Christian Schuschnig | Franz Matthies | Benjamin Redling | Udo Hahn
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce CODE ALLTAG, a text corpus composed of German-language e-mails. It is divided into two partitions: the first of these portions, CODE ALLTAG_XL, consists of a bulk-size collection drawn from an openly accessible e-mail archive (roughly 1.5M e-mails), whereas the second portion, CODE ALLTAG_S+d, is much smaller in size (less than thousand e-mails), yet excels with demographic data from each author of an e-mail. CODE ALLTAG, thus, currently constitutes the largest E-Mail corpus ever built. In this paper, we describe, for both parts, the solicitation process for gathering e-mails, present descriptive statistical properties of the corpus, and, for CODE ALLTAG_S+d, reveal a compilation of demographic features of the donors of e-mails.

Franz Matthies

2016

2013

Co-authors

Venues