Stefan Thater


2018

pdf
SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge
Simon Ostermann | Michael Roth | Ashutosh Modi | Stefan Thater | Manfred Pinkal
Proceedings of the 12th International Workshop on Semantic Evaluation

This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.

pdf
Mapping Texts to Scripts: An Entailment Study
Simon Ostermann | Hannah Seitz | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
Simon Ostermann | Ashutosh Modi | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering
Lilian Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

pdf
A Mixture Model for Learning Multi-Sense Word Embeddings
Dai Quoc Nguyen | Dat Quoc Nguyen | Ashutosh Modi | Stefan Thater | Manfred Pinkal
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.

pdf
Aligning Script Events with Narrative Texts
Simon Ostermann | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.

pdf
Sequence to Sequence Learning for Event Prediction
Dai Quoc Nguyen | Dat Quoc Nguyen | Cuong Xuan Chu | Stefan Thater | Manfred Pinkal
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.

2016

pdf
Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
Lena Keiper | Andrea Horbach | Stefan Thater
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a novel method to automatically improve the accurracy of part-of-speech taggers on learner language. The key idea underlying our approach is to exploit the structure of a typical language learner task and automatically induce POS information for out-of-vocabulary (OOV) words. To evaluate the effectiveness of our approach, we add manual POS and normalization information to an existing language learner corpus. Our evaluation shows an increase in accurracy from 72.4% to 81.5% on OOV words.

pdf
A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
Andrea Horbach | Andrea Hensler | Sabine Krome | Jakob Prange | Werner Scholze-Stubenrecht | Diana Steffen | Stefan Thater | Christian Wellner | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present an annotation study on a representative dataset of literal and idiomatic uses of German infinitive-verb compounds in newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus resource which offers itself as a testbed for automatic idiomaticity detection and coarse-grained word-sense disambiguation. We trained a classifier on the corpus which was able to distinguish literal and idiomatic uses with an accuracy of 85 %.

pdf
Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
Stefan Ecker | Andrea Horbach | Stefan Thater
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We propose an unsupervised system for a variant of cross-lingual lexical substitution (CLLS) to be used in a reading scenario in computer-assisted language learning (CALL), in which single-word translations provided by a dictionary are ranked according to their appropriateness in context. In contrast to most alternative systems, ours does not rely on either parallel corpora or machine translation systems, making it suitable for low-resource languages as the language to be learned. This is achieved by a graph-based scoring mechanism which can deal with ambiguous translations of context words provided by a dictionary. Due to this decoupling from the source language, we need monolingual corpus resources only for the target language, i.e. the language of the translation candidates. We evaluate our approach for the language pair Norwegian Nynorsk-English on an exploratory manually annotated gold standard and report promising results. When running our system on the original SemEval CLLS task, we rank 6th out of 18 (including 2 baselines and our 2 system variants) in the best evaluation.

pdf
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Lilian D. A. Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.

pdf
Event participant modelling with neural networks
Ottokar Tilk | Vera Demberg | Asad Sayeed | Dietrich Klakow | Stefan Thater
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
UdS-(retrain|distributional|surface): Improving POS Tagging for OOV Words in German CMC and Web Data
Jakob Prange | Andrea Horbach | Stefan Thater
Proceedings of the 10th Web as Corpus Workshop

2014

pdf
What Substitutes Tell Us - Analysis of an “All-Words” Lexical Substitution Corpus
Gerhard Kremer | Katrin Erk | Sebastian Padó | Stefan Thater
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf
Grounding Action Descriptions in Videos
Michaela Regneri | Marcus Rohrbach | Dominikus Wetzel | Stefan Thater | Bernt Schiele | Manfred Pinkal
Transactions of the Association for Computational Linguistics, Volume 1

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions.

2012

pdf
Saarland: Vector-based models of semantic textual similarity
Georgiana Dinu | Stefan Thater
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
A Comparison of Knowledge-based Algorithms for Graded Word Sense Assignment
Annemarie Friedrich | Nikos Engonopoulos | Stefan Thater | Manfred Pinkal
Proceedings of COLING 2012: Posters

pdf
A comparison of models of word meaning in context
Georgiana Dinu | Stefan Thater | Soeren Laue
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf
Robust Disambiguation of Named Entities in Text
Johannes Hoffart | Mohamed Amir Yosef | Ilaria Bordino | Hagen Fürstenau | Manfred Pinkal | Marc Spaniol | Bilyana Taneva | Stefan Thater | Gerhard Weikum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Word Meaning in Context: A Simple and Effective Vector Model
Stefan Thater | Hagen Fürstenau | Manfred Pinkal
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Sebastian Padó | Stefan Thater
Proceedings of the TextInfer 2011 Workshop on Textual Entailment

2010

pdf
Computing Weakest Readings
Alexander Koller | Stefan Thater
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Stefan Thater | Hagen Fürstenau | Manfred Pinkal
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf
Ranking Paraphrases in Context
Stefan Thater | Georgiana Dinu | Manfred Pinkal
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf
Regular Tree Grammars as a Formalism for Scope Underspecification
Alexander Koller | Michaela Regneri | Stefan Thater
Proceedings of ACL-08: HLT

2007

pdf bib
A Semantic Approach To Textual Entailment: System Evaluation and Task Analysis
Aljoscha Burchardt | Nils Reiter | Stefan Thater | Anette Frank
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf
Towards a redundancy elimination algorithm for underspecified descriptions
Alexander Koller | Stefan Thater
Proceedings of the Fifth International Workshop on Inference in Computational Semantics (ICoS-5)

pdf
An Improved Redundancy Elimination Algorithm for Underspecified Representations
Alexander Koller | Stefan Thater
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf
Efficient Solving and Exploration of Scope Ambiguities
Alexander Koller | Stefan Thater
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf
The Evolution of Dominance Constraint Solvers
Alexander Koller | Stefan Thater
Proceedings of Workshop on Software

2004

pdf
Minimal Recursion Semantics as Dominance Constraints: Translation, Evaluation, and Analysis
Ruth Fuchss | Alexander Koller | Joachim Niehren | Stefan Thater
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
A Relational Syntax-Semantics Interface Based on Dependency Grammar
Ralph Debusmann | Denys Duchier | Alexander Koller | Marco Kuhlmann | Gert Smolka | Stefan Thater
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
TAG Parsing as Model Enumeration
Ralph Debusmann | Denys Duchier | Marco Kuhlmann | Stefan Thater
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms

2003

pdf
Underspecification formalisms: Hole semantics as dominance constraints
Alexander Koller | Joachim Niehren | Stefan Thater
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Bridging the Gap Between Underspecification Formalisms: Minimal Recursion Semantics as Dominance Constraints
Joachim Niehren | Stefan Thater
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2001

pdf
Generating with a Grammar Based on Tree Descriptions: a Constraint-Based Approach
Claire Gardent | Stefan Thater
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics