Manfred Pinkal

2019

pdf abs
MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants
Simon Ostermann | Michael Roth | Manfred Pinkal
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions cannot be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at http://www.sfb1102.uni-saarland.de/?page_id=2582

pdf abs
Detecting Everyday Scenarios in Narrative Texts
Lilian Diana Awuor Wanzare | Michael Roth | Manfred Pinkal
Proceedings of the Second Workshop on Storytelling

Script knowledge consists of detailed information on everyday activities. Such information is often taken for granted in text and needs to be inferred by readers. Therefore, script knowledge is a central component to language comprehension. Previous work on representing scripts is mostly based on extensive manual work or limited to scenarios that can be found with sufficient redundancy in large corpora. We introduce the task of scenario detection, in which we identify references to scripts. In this task, we address a wide range of different scripts (200 scenarios) and we attempt to identify all references to them in a collection of narrative texts. We present a first benchmark data set and a baseline model that tackles scenario detection using techniques from topic segmentation and text classification.

2018

pdf abs
Grounding Semantic Roles in Images
Carina Silberer | Manfred Pinkal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame—semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.

pdf abs
SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge
Simon Ostermann | Michael Roth | Ashutosh Modi | Stefan Thater | Manfred Pinkal
Proceedings of the 12th International Workshop on Semantic Evaluation

This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.

pdf
Multi-layer Annotation of the Rigveda
Oliver Hellwig | Heinrich Hettrich | Ashutosh Modi | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Mapping Texts to Scripts: An Entailment Study
Simon Ostermann | Hannah Seitz | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
Simon Ostermann | Ashutosh Modi | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Semi-Supervised Clustering for Short Answer Scoring
Andrea Horbach | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction
Ashutosh Modi | Ivan Titov | Vera Demberg | Asad Sayeed | Manfred Pinkal
Transactions of the Association for Computational Linguistics, Volume 5

Recent research in psycholinguistics has provided increasing evidence that humans predict upcoming content. Prediction also affects perception and might be a key to robustness in human language processing. In this paper, we investigate the factors that affect human prediction by building a computational model that can predict upcoming discourse referents based on linguistic knowledge alone vs. linguistic knowledge jointly with common-sense knowledge in the form of scripts. We find that script knowledge significantly improves model estimates of human predictions. In a second study, we test the highly controversial hypothesis that predictability influences referring expression type but do not find evidence for such an effect.

pdf abs
Sequence to Sequence Learning for Event Prediction
Dai Quoc Nguyen | Dat Quoc Nguyen | Cuong Xuan Chu | Stefan Thater | Manfred Pinkal
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.

pdf bib abs
Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering
Lilian Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

pdf abs
A Mixture Model for Learning Multi-Sense Word Embeddings
Dai Quoc Nguyen | Dat Quoc Nguyen | Ashutosh Modi | Stefan Thater | Manfred Pinkal
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.

pdf abs
Aligning Script Events with Narrative Texts
Simon Ostermann | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.

2016

pdf
Situation entity types: automatic classification of clause-level aspect
Annemarie Friedrich | Alexis Palmer | Manfred Pinkal
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present an annotation study on a representative dataset of literal and idiomatic uses of German infinitive-verb compounds in newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus resource which offers itself as a testbed for automatic idiomaticity detection and coarse-grained word-sense disambiguation. We trained a classifier on the corpus which was able to distinguish literal and idiomatic uses with an accuracy of 85 %.

pdf abs
InScript: Narrative texts annotated with script information
Ashutosh Modi | Tatjana Anikina | Simon Ostermann | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.

pdf abs
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Lilian D. A. Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.

2015

pdf
Learning to predict script events from domain-specific text
Rachel Rudinger | Vera Demberg | Ashutosh Modi | Benjamin Van Durme | Manfred Pinkal
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
Annotating genericity: a survey, a scheme, and a corpus
Annemarie Friedrich | Alexis Palmer | Melissa Peate Sørensen | Manfred Pinkal
Proceedings of the 9th Linguistic Annotation Workshop

pdf bib
Linking discourse modes and situation entity types in a cross-linguistic corpus study
Kleio-Isidora Mavridou | Annemarie Friedrich | Melissa Peate Sørensen | Alexis Palmer | Manfred Pinkal
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

pdf
Annotating Entailment Relations for Shortanswer Questions
Simon Ostermann | Andrea Horbach | Manfred Pinkal
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

pdf
Discourse-sensitive Automatic Identification of Generic Expressions
Annemarie Friedrich | Manfred Pinkal
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Automatic recognition of habituals: a three-way classification of clausal aspect
Annemarie Friedrich | Manfred Pinkal
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf abs
Aligning Predicate-Argument Structures for Paraphrase Fragment Extraction
Michaela Regneri | Rui Wang | Manfred Pinkal
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Paraphrases and paraphrasing algorithms have been found of great importance in various natural language processing tasks. While most paraphrase extraction approaches extract equivalent sentences, sentences are an inconvenient unit for further processing, because they are too specific, and often not exact paraphrases. Paraphrase fragment extraction is a technique that post-processes sentential paraphrases and prunes them to more convenient phrase-level units. We present a new approach that uses semantic roles to extract paraphrase fragments from sentence pairs that share semantic content to varying degrees, including full paraphrases. In contrast to previous systems, the use of semantic parses allows for extracting paraphrases with high wording variance and different syntactic categories. The approach is tested on four different input corpora and compared to two previous systems for extracting paraphrase fragments. Our system finds three times as many good paraphrase fragments per sentence pair as the baselines, and at the same time outputs 30% fewer unrelated fragment pairs.

pdf
A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge
Lea Frermann | Ivan Titov | Manfred Pinkal
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Paraphrase Detection for Short Answer Scoring
Nikolina Koleva | Andrea Horbach | Alexis Palmer | Simon Ostermann | Manfred Pinkal
Proceedings of the third workshop on NLP for computer-assisted language learning

2013

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions.

pdf
Using the text to evaluate short answers for reading comprehension exercises
Andrea Horbach | Alexis Palmer | Manfred Pinkal
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.

pdf
Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Stefan Thater | Hagen Fürstenau | Manfred Pinkal
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Learning Script Knowledge with Web Experiments
Michaela Regneri | Alexander Koller | Manfred Pinkal
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf
Ranking Paraphrases in Context
Stefan Thater | Georgiana Dinu | Manfred Pinkal
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf abs
CLIoS: Cross-lingual Induction of Speech Recognition Grammars
Nadine Perera | Michael Pitz | Manfred Pinkal
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present an approach for the cross-lingual induction of speech recognition grammars that separates the task of translation from the task of grammar generation. The source speech recognition grammar is used to generate phrases, which are translated by a common translation service. The target recognition grammar is induced by using the production rules of the source language, manually translated sentences and a statistical word alignment tool. We induce grammars for the target languages Spanish and Japanese. The coverage of the resulting grammars is evaluated on two corpora and compared quantitatively and qualitatively to a grammar induced with unsupervised monolingual grammar induction.

2006

pdf abs
The SALSA Corpus: a German Corpus Resource for Lexical Semantics
Aljoscha Burchardt | Katrin Erk | Anette Frank | Andrea Kowalski | Sebastian Padó | Manfred Pinkal
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the SALSA corpus, a large German corpus manually annotated with manual role-semantic annotation, based on the syntactically annotated TIGER newspaper corpus. The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the annotation framework (frame semantics) and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications.

pdf
Automatic Extraction of Definitions from German Court Decisions
Stephan Walter | Manfred Pinkal
Proceedings of the Workshop on Information Extraction Beyond The Document