Judith Eckle-Kohler

2017

pdf abs
Metaheuristic Approaches to Lexical Substitution and Simplification
Sallam Abualhaija | Tristan Miller | Judith Eckle-Kohler | Iryna Gurevych | Karl-Heinz Zimmermann
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we propose using metaheuristics—in particular, simulated annealing and the new D-Bees algorithm—to solve word sense disambiguation as an optimization problem within a knowledge-based lexical substitution system. We are the first to perform such an extrinsic evaluation of metaheuristics, for which we use two standard lexical substitution datasets, one English and one German. We find that D-Bees has robust performance for both languages, and performs better than simulated annealing, though both achieve good results. Moreover, the D-Bees–based lexical substitution system outperforms state-of-the-art systems on several evaluation metrics. We also show that D-Bees achieves competitive performance in lexical simplification, a variant of lexical substitution.

The Story Cloze test is a recent effort in providing a common test scenario for text understanding systems. As part of the LSDSem 2017 shared task, we present a system based on a deep learning architecture combined with a rich set of manually-crafted linguistic features. The system outperforms all known baselines for the task, suggesting that the chosen approach is promising. We additionally present two methods for generating further training data based on stories from the ROCStories corpus.

pdf abs
Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization
Maxime Peyrard | Judith Eckle-Kohler
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a new supervised framework that learns to estimate automatic Pyramid scores and uses them for optimization-based extractive multi-document summarization. For learning automatic Pyramid scores, we developed a method for automatic training data generation which is based on a genetic algorithm using automatic Pyramid as the fitness function. Our experimental evaluation shows that our new framework significantly outperforms strong baselines regarding automatic Pyramid, and that there is much room for improvement in comparison with the upper-bound for automatic Pyramid.

pdf abs
A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments
Maxime Peyrard | Judith Eckle-Kohler
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a new framework for evaluating extractive summarizers, which is based on a principled representation as optimization problem. We prove that every extractive summarizer can be decomposed into an objective function and an optimization technique. We perform a comparative analysis and evaluation of several objective functions embedded in well-known summarizers regarding their correlation with human judgments. Our comparison of these correlations across two datasets yields surprising insights into the role and performance of objective functions in the different summarizers.

pdf abs
Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets
Gabriel Stanovsky | Judith Eckle-Kohler | Yevgeniy Puzikov | Ido Dagan | Iryna Gurevych
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Previous models for the assessment of commitment towards a predicate in a sentence (also known as factuality prediction) were trained and tested against a specific annotated dataset, subsequently limiting the generality of their results. In this work we propose an intuitive method for mapping three previously annotated corpora onto a single factuality scale, thereby enabling models to be tested across these corpora. In addition, we design a novel model for factuality prediction by first extending a previous rule-based factuality prediction system and applying it over an abstraction of dependency trees, and then using the output of this system in a supervised classifier. We show that this model outperforms previous methods on all three datasets. We make both the unified factuality corpus and our new model publicly available.

2016

pdf abs
Generating Training Data for Semantic Role Labeling based on Label Transfer from Linked Lexical Resources
Silvana Hartmann | Judith Eckle-Kohler | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 4

We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., Word-Net, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and roles. The automatically labeled training data (www.ukp.tu-darmstadt.de/knowledge-based-srl/) are evaluated on four corpora from different domains for the tasks of word sense disambiguation and semantic role classification. Results show that classifiers trained on our generated data equal those resulting from a standard supervised setting.

pdf abs
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
Maria Sukhareva | Judith Eckle-Kohler | Ivan Habernal | Iryna Gurevych
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a new large dataset of 12403 context-sensitive verb relations manually annotated via crowdsourcing. These relations capture fine-grained semantic information between verb-centric propositions, such as temporal or entailment relations. We propose a novel semantic verb relation scheme and design a multi-step annotation approach for scaling-up the annotations using crowdsourcing. We employ several quality measures and report on agreement scores. The resulting dataset is available under a permissive CreativeCommons license at www.ukp.tu-darmstadt.de/data/verb-relations/. It represents a valuable resource for various applications, such as automatic information consolidation or automatic summarization.

pdf abs
A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence
Maxime Peyrard | Judith Eckle-Kohler
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Extracting summaries via integer linear programming and submodularity are popular and successful techniques in extractive multi-document summarization. However, many interesting optimization objectives are neither submodular nor factorizable into an integer linear program. We address this issue and present a general optimization framework where any function of input documents and a system summary can be plugged in. Our framework includes two kinds of summarizers – one based on genetic algorithms, the other using a swarm intelligence approach. In our experimental evaluation, we investigate the optimization of two information-theoretic summary evaluation metrics and find that our framework yields competitive results compared to several strong summarization baselines. Our comparative analysis of the genetic and swarm summarizers reveals interesting complementary properties.

pdf abs
The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach
Markus Zopf | Maxime Peyrard | Judith Eckle-Kohler
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Research in multi-document summarization has focused on newswire corpora since the early beginnings. However, the newswire genre provides genre-specific features such as sentence position which are easy to exploit in summarization systems. Such easy to exploit genre-specific features are available in other genres as well. We therefore present the new hMDS corpus for multi-document summarization, which contains heterogeneous source documents from multiple text genres, as well as summaries with different lengths. For the construction of the corpus, we developed a novel construction approach which is suited to build large and heterogeneous summarization corpora with little effort. The method reverses the usual process of writing summaries for given source documents: it combines already available summaries with appropriate source documents. In a detailed analysis, we show that our new corpus is significantly different from the homogeneous corpora commonly used, and that it is heterogeneous along several dimensions. Our experimental evaluation using well-known state-of-the-art summarization systems shows that our corpus poses new challenges in the field of multi-document summarization. Last but not least, we make our corpus publicly available to the research community at the corpus web page https://github.com/AIPHES/hMDS.

pdf abs
Semi-automatic Detection of Cross-lingual Marketing Blunders based on Pragmatic Label Propagation in Wiktionary
Christian M. Meyer | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We introduce the task of detecting cross-lingual marketing blunders, which occur if a trade name resembles an inappropriate or negatively connotated word in a target language. To this end, we suggest a formal task definition and a semi-automatic method based the propagation of pragmatic labels from Wiktionary across sense-disambiguated translations. Our final tool assists users by providing clues for problematic names in any language, which we simulate in two experiments on detecting previously occurred marketing blunders and identifying relevant clues for established international brands. We conclude the paper with a suggested research roadmap for this new task. To initiate further research, we publish our online demo along with the source code and data at http://uby.ukp.informatik.tu-darmstadt.de/blunder/.

pdf abs
Modeling Extractive Sentence Intersection via Subtree Entailment
Omer Levy | Ido Dagan | Gabriel Stanovsky | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Sentence intersection captures the semantic overlap of two texts, generalizing over paradigms such as textual entailment and semantic text similarity. Despite its modeling power, it has received little attention because it is difficult for non-experts to annotate. We analyze 200 pairs of similar sentences and identify several underlying properties of sentence intersection. We leverage these insights to design an algorithm that decomposes the sentence intersection task into several simpler annotation tasks, facilitating the construction of a high quality dataset via crowdsourcing. We implement this approach and provide an annotated dataset of 1,764 sentence intersections.

pdf
Verbs Taking Clausal and Non-Finite Arguments as Signals of Modality – Revisiting the Issue of Meaning Grounded in Syntax
Judith Eckle-Kohler
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Optimizing an Approximation of ROUGE - a Problem-Reduction Approach to Extractive Multi-Document Summarization
Maxime Peyrard | Judith Eckle-Kohler
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf
On the Role of Discourse Markers for Discriminating Claims and Premises in Argumentative Discourse
Judith Eckle-Kohler | Roland Kluge | Iryna Gurevych
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Linking the Thoughts: Analysis of Argumentation Structures in Scientific Publications
Christian Kirschner | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 2nd Workshop on Argumentation Mining

2014

pdf abs
Lexical Substitution Dataset for German
Kostadin Cholakov | Chris Biemann | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.

pdf
Automated Verb Sense Labelling Based on Linked Lexical Resources
Kostadin Cholakov | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2012

pdf abs
UBY-LMF – A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF
Judith Eckle-Kohler | Iryna Gurevych | Silvana Hartmann | Michael Matuschek | Christian M. Meyer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present UBY-LMF, an LMF-based model for large-scale, heterogeneous multilingual lexical-semantic resources (LSRs). UBY-LMF allows the standardization of LSRs down to a fine-grained level of lexical information by employing a large number of Data Categories from ISOCat. We evaluate UBY-LMF by converting nine LSRs in two languages to the corresponding format: the English WordNet, Wiktionary, Wikipedia, OmegaWiki, FrameNet and VerbNet and the German Wikipedia, Wiktionary and GermaNet. The resulting LSR, UBY (Gurevych et al., 2012), holds interoperable versions of all nine resources which can be queried by an easy to use public Java API. UBY-LMF covers a wide range of information types from expert-constructed and collaboratively constructed resources for English and German, also including links between different resources at the word sense level. It is designed to accommodate further resources and languages as well as automatically mined lexical-semantic knowledge.

This paper describes the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (OKFN). The OWLG is an initiative concerned with linguistic data by scholars from diverse fields, including linguistics, NLP, and information science. The primary goal of the working group is to promote the idea of open linguistic resources, to develop means for their representation and to encourage the exchange of ideas across different disciplines. This paper summarizes the progress of the working group, goals that have been identified, problems that we are going to address, and recent activities and ongoing developments. Here, we put particular emphasis on the development of a Linked Open Data (sub-)cloud of linguistic resources that is currently being pursued by several OWLG members.

pdf
Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability
Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF
Iryna Gurevych | Judith Eckle-Kohler | Silvana Hartmann | Michael Matuschek | Christian M. Meyer | Christian Wirth
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics