Rüdiger Gleim


WikiDragon: A Java Framework For Diachronic Content And Network Analysis Of MediaWikis
Rüdiger Gleim | Alexander Mehler | Sung Y. Song
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
Steffen Eger | Rüdiger Gleim | Alexander Mehler
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper relates to the challenge of morphological tagging and lemmatization in morphologically rich languages by example of German and Latin. We focus on the question what a practitioner can expect when using state-of-the-art solutions out of the box. Moreover, we contrast these with old(er) methods and implementations for POS tagging. We examine to what degree recent efforts in tagger development are reflected by improved accuracies ― and at what cost, in terms of training and processing time. We also conduct in-domain vs. out-domain evaluation. Out-domain evaluations are particularly insightful because the distribution of the data which is being tagged by a user will typically differ from the distribution on which the tagger has been trained. Furthermore, two lemmatization techniques are evaluated. Finally, we compare pipeline tagging vs. a tagging approach that acknowledges dependencies between inflectional categories.


Computational Linguistics for Mere Mortals - Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities
Rüdiger Gleim | Alexander Mehler
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Delivering linguistic resources and easy-to-use methods to a broad public in the humanities is a challenging task. On the one hand users rightly demand easy to use interfaces but on the other hand want to have access to the full flexibility and power of the functions being offered. Even though a growing number of excellent systems exist which offer convenient means to use linguistic resources and methods, they usually focus on a specific domain, as for example corpus exploration or text categorization. Architectures which address a broad scope of applications are still rare. This article introduces the eHumanities Desktop, an online system for corpus management, processing and analysis which aims at bridging the gap between powerful command line tools and intuitive user interfaces.


eHumanities Desktop - An Online System for Corpus Management and Analysis in Support of Computing in the Humanities
Rüdiger Gleim | Ulli Waltinger | Alexandra Ernst | Alexander Mehler | Tobias Feith | Dietmar Esch
Proceedings of the Demonstrations Session at EACL 2009


Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems
Georg Rehm | Marina Santini | Alexander Mehler | Pavel Braslavski | Rüdiger Gleim | Andrea Stubbe | Svetlana Symonenko | Mirko Tavosanis | Vedrana Vidulin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres.

A Unified Database of Dependency Treebanks: Integrating, Quantifying & Evaluating Dependency Data
Olga Pustylnikov | Alexander Mehler | Rüdiger Gleim
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a database of 11 dependency treebanks which were unified by means of a two-dimensional graph format. The format was evaluated with respect to storage-complexity on the one hand, and efficiency of data access on the other hand. An example of how the treebanks can be integrated within a unique interface is given by means of the DTDB interface.


Web corpus mining by instance of Wikipedia
Rüdiger Gleim | Alexander Mehler | Matthias Dehmer
Proceedings of the 2nd International Workshop on Web as Corpus