Tillmann Dönicke


2021

pdf bib
Annotating Quantified Phenomena in Complex Sentence Structures Using the Example of Generalising Statements in Literary Texts
Tillmann Dönicke | Luisa Gödeke | Hanna Varachkina
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

We present a tagset for the annotation of quantification which we currently use to annotate certain quantified statements in fictional works of literature. Literary texts feature a rich variety in expressing quantification, including a broad range of lexemes to express quantifiers and complex sentence structures to express the restrictor and the nuclear scope of a quantification. Our tagset consists of seven tags and covers all types of quantification that occur in natural language, including vague quantification and generic quantification. In the second part of the paper, we introduce our German corpus with annotations of generalising statements, which form a proper subset of quantified statements.

pdf bib
Delexicalised Multilingual Discourse Segmentation for DISRPT 2021 and Tense, Mood, Voice and Modality Tagging for 11 Languages
Tillmann Dönicke
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

This paper describes our participating system for the Shared Task on Discourse Segmentation and Connective Identification across Formalisms and Languages. Key features of the presented approach are the formulation as a clause-level classification task, a language-independent feature inventory based on Universal Dependencies grammar, and composite-verb-form analysis. The achieved F1 is 92% for German and English and lower for other languages. The paper also presents a clause-level tagger for grammatical tense, aspect, mood, voice and modality in 11 languages.

2020

pdf bib
Identifying and Handling Cross-Treebank Inconsistencies in UD: A Pilot Study
Tillmann Dönicke | Xiang Yu | Jonas Kuhn
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

The Universal Dependencies treebanks are a still-growing collection of treebanks for a wide range of languages, all annotated with a common inventory of dependency relations. Yet, the usages of the relations can be categorically different even for treebanks of the same language. We present a pilot study on identifying such inconsistencies in a language-independent way and conduct an experiment which illustrates that a proper handling of inconsistencies can improve parsing performance by several percentage points.

pdf bib
#GCDH at WNUT-2020 Task 2: BERT-Based Models for the Detection of Informativeness in English COVID-19 Related Tweets
Hanna Varachkina | Stefan Ziehe | Tillmann Dönicke | Franziska Pannach
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

In this system paper, we present a transformer-based approach to the detection of informativeness in English tweets on the topic of the current COVID-19 pandemic. Our models distinguish informative tweets, i.e. tweets containing statistics on recovery, suspected and confirmed cases and COVID-19 related deaths, from uninformative tweets. We present two transformer-based approaches as well as a Naive Bayes classifier and a support vector machine as baseline systems. The transformer models outperform the baselines by more than 0.1 in F1-score, with F1-scores of 0.9091 and 0.9036. Our models were submitted to the shared task Identification of informative COVID-19 English tweets WNUT-2020 Task 2.

pdf bib
Real-Valued Logics for Typological Universals: Framework and Application
Tillmann Dönicke | Xiang Yu | Jonas Kuhn
Proceedings of the 28th International Conference on Computational Linguistics

This paper proposes a framework for the expression of typological statements which uses real-valued logics to capture the empirical truth value (truth degree) of a formula on a given data source, e.g. a collection of multilingual treebanks with comparable annotation. The formulae can be arbitrarily complex expressions of propositional logic. To illustrate the usefulness of such a framework, we present experiments on the Universal Dependencies treebanks for two use cases: (i) empirical (re-)evaluation of established formulae against the spectrum of available treebanks and (ii) evaluating new formulae (i.e. potential candidates for universals) generated by a search algorithm.

pdf bib
Clause-Level Tense, Mood, Voice and Modality Tagging for German
Tillmann Dönicke
Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories

2019

pdf bib
Multiclass Text Classification on Unbalanced, Sparse and Noisy Data
Matthias Damaschk | Tillmann Dönicke | Florian Lux
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

This paper discusses methods to improve the performance of text classification on data that is difficult to classify due to a large number of unbalanced classes with noisy examples. A variety of features are tested, in combination with three different neural-network-based methods with increasing complexity. The classifiers are applied to a songtext–artist dataset which is large, unbalanced and noisy. We come to the conclusion that substantial improvement can be obtained by removing unbalancedness and sparsity from the data. This fulfils a classification task unsatisfactorily—however, with contemporary methods, it is a practical step towards fairly satisfactory results.