Michael Strube

2024

pdf abs
Graph-based Clustering for Detecting Semantic Change Across Time and Languages
Xianghe Ma | Michael Strube | Wei Zhao
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the poor quality of the clustering methods to produce sense clusters—which struggle to capture word senses, especially those with low frequency. This issue hinders the next step in examining how changes in word senses in one language influence another. To address this issue, we propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages, including the acquisition and loss of these senses over time. Our experimental results show that our approach substantially surpasses previous approaches in the SemEval2020 binary classification task across four languages. Moreover, we showcase the ability of our approach as a versatile visualization tool to detect semantic changes in both intra-language and inter-language setups. We make our code and data publicly available.

2023

pdf abs
SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
Mehwish Fatima | Tim Kolber | Katja Markert | Michael Strube
Proceedings of the 4th New Frontiers in Summarization Workshop

Cross-lingual science journalism is a recently introduced task that generates popular science summaries of scientific articles different from the source language for non-expert readers. A popular science summary must contain salient content of the input document while focusing on coherence and comprehensibility. Meanwhile, generating a cross-lingual summary from the scientific texts in a local language for the targeted audience is challenging. Existing research on cross-lingual science journalism investigates the task with a pipeline model to combine text simplification and cross-lingual summarization. We extend the research in cross-lingual science journalism by introducing a novel, multi-task learning architecture that combines the aforementioned NLP tasks. Our approach is to jointly train the two high-level NLP tasks in SimCSum for generating cross-lingual popular science summaries. We investigate the performance of SimCSum against the pipeline model and several other strong baselines with several evaluation metrics and human evaluation. Overall, SimCSum demonstrates statistically significant improvements over the state-of-the-art on two non-synthetic cross-lingual scientific datasets. Furthermore, we conduct an in-depth investigation into the linguistic properties of generated summaries and an error analysis.

pdf abs
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei Zhao | Michael Strube | Steffen Eger
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level—which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at https://github.com/AIPHES/DiscoScore.

pdf bib
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)
Michael Strube | Chloe Braud | Christian Hardmeier | Junyi Jessy Li | Sharid Loaiciga | Amir Zeldes
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

pdf abs
HITS at DISRPT 2023: Discourse Segmentation, Connective Detection, and Relation Classification
Wei Liu | Yi Fan | Michael Strube
Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

HITS participated in the Discourse Segmentation (DS, Task 1) and Connective Detection (CD, Task 2) tasks at the DISRPT 2023. Task 1 focuses on segmenting the text into discourse units, while Task 2 aims to detect the discourse connectives. We deployed a framework based on different pre-trained models according to the target language for these two tasks.HITS also participated in the Relation Classification track (Task 3). The main task was recognizing the discourse relation between text spans from different languages. We designed a joint model for languages with a small corpus while separate models for large corpora. The adversarial training strategy is applied to enhance the robustness of relation classifiers.

pdf abs
Cross-lingual Science Journalism: Select, Simplify and Rewrite Summaries for Non-expert Readers
Mehwish Fatima | Michael Strube
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automating Cross-lingual Science Journalism (CSJ) aims to generate popular science summaries from English scientific texts for non-expert readers in their local language. We introduce CSJ as a downstream task of text simplification and cross-lingual scientific summarization to facilitate science journalists’ work. We analyze the performance of possible existing solutions as baselines for the CSJ task. Based on these findings, we propose to combine the three components - SELECT, SIMPLIFY and REWRITE (SSR) to produce cross-lingual simplified science summaries for non-expert readers. Our empirical evaluation on the Wikipedia dataset shows that SSR significantly outperforms the baselines for the CSJ task and can serve as a strong baseline for future work. We also perform an ablation study investigating the impact of individual components of SSR. Further, we analyze the performance of SSR on a high-quality, real-world CSJ dataset with human evaluation and in-depth analysis, demonstrating the superior performance of SSR for CSJ.

pdf abs
Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional Networks
Wei Liu | Xiyan Fu | Michael Strube
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document’s coherence patterns, ignoring the underlying correlation between documents. We investigate a GCN-based coherence model that is capable of capturing structural similarities between documents. Our model first creates a graph structure for each document, from where we mine different subgraph patterns. We then construct a heterogeneous graph for the training corpus, connecting documents based on their shared subgraphs. Finally, a GCN is applied to the heterogeneous graph to model the connectivity relationships. We evaluate our method on two tasks, assessing discourse coherence and automated essay scoring. Results show that our GCN-based model outperforms all baselines, achieving a new state-of-the-art on both tasks.

pdf abs
Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation
Wei Liu | Michael Strube
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit discourse relation classification is a challenging task due to the absence of discourse connectives. To overcome this issue, we design an end-to-end neural model to explicitly generate discourse connectives for the task, inspired by the annotation process of PDTB. Specifically, our model jointly learns to generate discourse connectives between arguments and predict discourse relations based on the arguments and the generated connectives. To prevent our relation classifier from being misled by poor connectives generated at the early stage of training while alleviating the discrepancy between training and inference, we adopt Scheduled Sampling to the joint learning. We evaluate our method on three benchmarks, PDTB 2.0, PDTB 3.0, and PCC. Results show that our joint model significantly outperforms various baselines on three datasets, demonstrating its superiority for the task.

pdf abs
Investigating Multilingual Coreference Resolution by Universal Annotations
Haixia Chai | Michael Strube
Findings of the Association for Computational Linguistics: EMNLP 2023

Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by using its harmonized universal morphosyntactic and coreference annotations. First, we study coreference by examining the ground truth data at different linguistic levels, namely mention, entity and document levels, and across different genres, to gain insights into the characteristics of coreference across multiple languages. Second, we perform an error analysis of the most challenging cases that the SotA system fails to resolve in the CRAC 2022 shared task using the universal annotations. Last, based on this analysis, we extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits for the MCR task. Our results show that our best configuration of features improves the baseline by 0.9% F1 score.

2022

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple tasks focusing individually on these key relations. The second edition of the shared task maintained the focus on these relations and used the same datasets as in 2021, but new test data were annotated, the 2021 data were checked, and new subtasks were added. In this paper, we discuss the annotation schemes, the datasets, the evaluation scripts used to assess the system performance on these tasks, and provide a brief summary of the participating systems and the results obtained across 230 runs from three teams, with most submissions achieving significantly better results than our baseline methods.

pdf abs
Evaluating Coreference Resolvers on Community-based Question Answering: From Rule-based to State of the Art
Haixia Chai | Nafise Sadat Moosavi | Iryna Gurevych | Michael Strube
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

Coreference resolution is a key step in natural language understanding. Developments in coreference resolution are mainly focused on improving the performance on standard datasets annotated for coreference resolution. However, coreference resolution is an intermediate step for text understanding and it is not clear how these improvements translate into downstream task performance. In this paper, we perform a thorough investigation on the impact of coreference resolvers in multiple settings of community-based question answering task, i.e., answer selection with long answers. Our settings cover multiple text domains and encompass several answer selection methods. We first inspect extrinsic evaluation of coreference resolvers on answer selection by using coreference relations to decontextualize individual sentences of candidate answers, and then annotate a subset of answers with coreference information for intrinsic evaluation. The results of our extrinsic evaluation show that while there is a significant difference between the performance of the rule-based system vs. state-of-the-art neural model on coreference resolution datasets, we do not observe a considerable difference on their impact on downstream models. Our intrinsic evaluation shows that (i) resolving coreference relations on less-formal text genres is more difficult even for trained annotators, and (ii) the values of linguistic-agnostic coreference evaluation metrics do not correlate with the impact on downstream data.

pdf abs
Entity-based Neural Local Coherence Modeling
Sungho Jeon | Michael Strube
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose an entity-based neural local coherence model which is linguistically more sound than previously proposed neural coherence models. Recent neural coherence models encode the input document using large-scale pretrained language models. Hence their basis for computing local coherence are words and even sub-words. The analysis of their output shows that these models frequently compute coherence on the basis of connections between (sub-)words which, from a linguistic perspective, should not play a role. Still, these models achieve state-of-the-art performance in several end applications. In contrast to these models, we compute coherence on the basis of entities by constraining the input to noun phrases and proper names. This provides us with an explicit representation of the most important items in sentences leading to the notion of focus. This brings our model linguistically in line with pre-neural models of computing coherence. It also gives us better insight into the behaviour of the model thus leading to better explainability. Our approach is also in accord with a recent study (O’Connor and Andreas, 2021), which shows that most usable information is captured by nouns and verbs in transformer-based language models. We evaluate our model on three downstream tasks showing that it is not only linguistically more sound than previous models but also that it outperforms them in end applications.

pdf abs
Incorporating Centering Theory into Neural Coreference Resolution
Haixia Chai | Michael Strube
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In recent years, transformer-based coreference resolution systems have achieved remarkable improvements on the CoNLL dataset. However, how coreference resolvers can benefit from discourse coherence is still an open question. In this paper, we propose to incorporate centering transitions derived from centering theory in the form of a graph into a neural coreference model. Our method improves the performance over the SOTA baselines, especially on pronoun resolution in long documents, formal well-structured text, and clusters with scattered mentions.

Writing the conclusion section of radiology reports is essential for communicating the radiology findings and its assessment to physician in a condensed form. In this work, we employ a transformer-based Seq2Seq model for generating the conclusion section of German radiology reports. The model is initialized with the pretrained parameters of a German BERT model and fine-tuned in our downstream task on our domain data. We proposed two strategies to improve the factual correctness of the model. In the first method, next to the abstractive learning objective, we introduce an extraction learning objective to train the decoder in the model to both generate one summary sequence and extract the key findings from the source input. The second approach is to integrate the pointer mechanism into the transformer-based Seq2Seq model. The pointer network helps the Seq2Seq model to choose between generating tokens from the vocabulary or copying parts from the source input during generation. The results of the automatic and human evaluations show that the enhanced Seq2Seq model is capable of generating human-like radiology conclusions and that the improved models effectively reduce the factual errors in the generations despite the small amount of training data.

2021

pdf bib
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

pdf bib abs
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Juntao Yu | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

In this paper, we provide an overview of the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogue. The shared task focuses on detecting anaphoric relations in different genres of conversations. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple subtasks focusing individually on these key relations. We discuss the evaluation scripts used to assess the system performance on these subtasks, and provide a brief summary of the participating systems and the results obtained across ?? runs from 5 teams, with most submissions achieving significantly better results than our baseline methods.

pdf abs
Countering the Influence of Essay Length in Neural Essay Scoring
Sungho Jeon | Michael Strube
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Previous work has shown that automated essay scoring systems, in particular machine learning-based systems, are not capable of assessing the quality of essays, but are relying on essay length, a factor irrelevant to writing proficiency. In this work, we first show that state-of-the-art systems, recent neural essay scoring systems, might be also influenced by the correlation between essay length and scores in a standard dataset. In our evaluation, a very simple neural model shows the state-of-the-art performance on the standard dataset. To consider essay content without taking essay length into account, we introduce a simple neural model assessing the similarity of content between an input essay and essays assigned different scores. This neural model achieves performance comparable to the state of the art on a standard dataset as well as on a second dataset. Our findings suggest that neural essay scoring systems should consider the characteristics of datasets to focus on text quality.

pdf abs
A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization
Mehwish Fatima | Michael Strube
Proceedings of the Third Workshop on New Frontiers in Summarization

Cross-lingual summarization is a challenging task for which there are no cross-lingual scientific resources currently available. To overcome the lack of a high-quality resource, we present a new dataset for monolingual and cross-lingual summarization considering the English-German pair. We collect high-quality, real-world cross-lingual data from Spektrum der Wissenschaft, which publishes human-written German scientific summaries of English science articles on various subjects. The generated Spektrum dataset is small; therefore, we harvest a similar dataset from the Wikipedia Science Portal to complement it. The Wikipedia dataset consists of English and German articles, which can be used for monolingual and cross-lingual summarization. Furthermore, we present a quantitative analysis of the datasets and results of empirical experiments with several existing extractive and abstractive summarization models. The results suggest the viability and usefulness of the proposed dataset for monolingual and cross-lingual summarization.

2020

pdf abs
Centering-based Neural Coherence Modeling with Hierarchical Discourse Segments
Sungho Jeon | Michael Strube
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Previous neural coherence models have focused on identifying semantic relations between adjacent sentences. However, they do not have the means to exploit structural information. In this work, we propose a coherence model which takes discourse structural information into account without relying on human annotations. We approximate a linguistic theory of coherence, Centering theory, which we use to track the changes of focus between discourse segments. Our model first identifies the focus of each sentence, recognized with regards to the context, and constructs the structural relationship for discourse segments by tracking the changes of the focus. The model then incorporates this structural information into a structure-aware transformer. We evaluate our model on two tasks, automated essay scoring and assessing writing quality. Our results demonstrate that our model, built on top of a pretrained language model, achieves state-of-the-art performance on both tasks. We next statistically examine the identified trees of texts assigned to different quality scores. Finally, we investigate what our model learns in terms of theoretical claims.

pdf abs
Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain
Mark-Christoph Müller | Sucheta Ghosh | Maja Rey | Ulrike Wittig | Wolfgang Müller | Michael Strube
Proceedings of the First Workshop on Scholarly Document Processing

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task

pdf abs
A Fully Hyperbolic Neural Model for Hierarchical Multi-Class Classification
Federico López | Michael Strube
Findings of the Association for Computational Linguistics: EMNLP 2020

Label inventories for fine-grained entity typing have grown in size and complexity. Nonetheless, they exhibit a hierarchical structure. Hyperbolic spaces offer a mathematically appealing approach for learning hierarchical representations of symbolic data. However, it is not clear how to integrate hyperbolic components into downstream tasks. This is the first work that proposes a fully hyperbolic model for multi-class multi-label classification, which performs all operations in hyperbolic space. We evaluate the proposed model on two challenging datasets and compare to different baselines that operate under Euclidean assumptions. Our hyperbolic model infers the latent hierarchy from the class distribution, captures implicit hyponymic relations in the inventory, and shows performance on par with state-of-the-art methods on fine-grained classification with remarkable reduction of the parameter size. A thorough analysis sheds light on the impact of each component in the final prediction and showcases its ease of integration with Euclidean layers.

pdf abs
A Large Harvested Corpus of Location Metonymy
Kevin Alex Mathews | Michael Strube
Proceedings of the Twelfth Language Resources and Evaluation Conference

Metonymy is a figure of speech in which an entity is referred to by another related entity. The existing datasets of metonymy are either too small in size or lack sufficient coverage. We propose a new, labelled, high-quality corpus of location metonymy called WiMCor, which is large in size and has high coverage. The corpus is harvested semi-automatically from English Wikipedia. We use different labels of varying granularity to annotate the corpus. The corpus can directly be used for training and evaluating automatic metonymy resolution systems. We construct benchmarks for metonymy resolution, and evaluate baseline methods using the new corpus.

pdf bib
Proceedings of the First Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube
Proceedings of the First Workshop on Computational Approaches to Discourse

pdf abs
Evaluation of Coreference Resolution Systems Under Adversarial Attacks
Haixia Chai | Wei Zhao | Steffen Eger | Michael Strube
Proceedings of the First Workshop on Computational Approaches to Discourse

A substantial overlap of coreferent mentions in the CoNLL dataset magnifies the recent progress on coreference resolution. This is because the CoNLL benchmark fails to evaluate the ability of coreference resolvers that requires linking novel mentions unseen at train time. In this work, we create a new dataset based on CoNLL, which largely decreases mention overlaps in the entire dataset and exposes the limitations of published resolvers on two aspects—lexical inference ability and understanding of low-level orthographic noise. Our findings show (1) the requirements for embeddings, used in resolvers, and for coreference resolutions are, by design, in conflict and (2) adversarial approaches are sometimes not legitimate to mitigate the obstacles, as they may falsely introduce mention overlaps in adversarial training and test sets, thus giving an inflated impression for the improvements.

pdf abs
Incremental Neural Lexical Coherence Modeling
Sungho Jeon | Michael Strube
Proceedings of the 28th International Conference on Computational Linguistics

Pretrained language models, neural models pretrained on massive amounts of data, have established the state of the art in a range of NLP tasks. They are based on a modern machine-learning technique, the Transformer which relates all items simultaneously to capture semantic relations in sequences. However, it differs from what humans do. Humans read sentences one-by-one, incrementally. Can neural models benefit by interpreting texts incrementally as humans do? We investigate this question in coherence modeling. We propose a coherence model which interprets sentences incrementally to capture lexical relations between them. We compare the state of the art in each task, simple neural models relying on a pretrained language model, and our model in two downstream tasks. Our findings suggest that interpreting texts incrementally as humans could be useful to design more advanced models.

2019

pdf abs
Fine-Grained Entity Typing in Hyperbolic Space
Federico López | Benjamin Heinzerling | Michael Strube
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

How can we represent hierarchical information present in large type inventories for entity typing? We study the suitability of hyperbolic embeddings to capture hierarchical relations between mentions in context and their target types in a shared vector space. We evaluate on two datasets and propose two different techniques to extract hierarchical information from the type inventory: from an expert-generated ontology and by automatically mining the dataset. The hyperbolic model shows improvements in some but not all cases over its Euclidean counterpart. Our analysis suggests that the adequacy of this geometry depends on the granularity of the type inventory and the representation of its distribution.

pdf abs
Adapting Deep Learning Methods for Mental Health Prediction on Social Media
Ivan Sekulic | Michael Strube
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Mental health poses a significant challenge for an individual’s well-being. Text analysis of rich resources, like social media, can contribute to deeper understanding of illnesses and provide means for their early detection. We tackle a challenge of detecting social media users’ mental status through deep learning-based models, moving away from traditional approaches to the task. In a binary classification task on predicting if a user suffers from one of nine different disorders, a hierarchical attention network outperforms previously set benchmarks for four of the disorders. Furthermore, we explore the limitations of our model and analyze phrases relevant for classification by inspecting the model’s word-level attention weights.

pdf abs
Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
Benjamin Heinzerling | Michael Strube
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging. We find that overall, a combination of BERT, BPEmb, and character representations works best across languages and tasks. A more detailed analysis reveals different strengths and weaknesses: Multilingual BERT performs well in medium- to high-resource languages, but is outperformed by non-contextual subword embeddings in a low-resource setting.

pdf abs
Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection
Nafise Sadat Moosavi | Leo Born | Massimo Poesio | Michael Strube
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additional annotation is costly and therefore, this solution does not scale to large corpora. In this paper, we propose the MINA algorithm for automatically extracting minimum spans to benefit from minimum span evaluation in all corpora. We show that the extracted minimum spans by MINA are consistent with those that are manually annotated by experts. Our experiments show that using minimum spans is in particular important in cross-dataset coreference evaluation, in which detected mention boundaries are noisier due to domain shift. We have integrated MINA into https://github.com/ns-moosavi/coval for reporting standard coreference scores based on both maximum and automatically detected minimum spans.

pdf bib
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials
Anoop Sarkar | Michael Strube
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials

pdf abs
On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Yi Zhu | Benjamin Heinzerling | Ivan Vulić | Michael Strube | Roi Reichart | Anna Korhonen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks: fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison: 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu); 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement.

2018

pdf bib
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing
Mark Alfano | Dirk Hovy | Margaret Mitchell | Michael Strube
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing

pdf abs
Using Linguistic Features to Improve the Generalization Capability of Neural Coreference Resolvers
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Coreference resolution is an intermediate step for text understanding. It is used in tasks and domains for which we do not necessarily have coreference annotated corpora. Therefore, generalization is of special importance for coreference resolution. However, while recent coreference resolvers have notable improvements on the CoNLL dataset, they struggle to generalize properly to new domains or datasets. In this paper, we investigate the role of linguistic features in building more generalizable coreference resolvers. We show that generalization improves only slightly by merely using a set of additional linguistic features. However, employing features and subsets of their values that are informative for coreference resolution, considerably improves generalization. Thanks to better generalization, our system achieves state-of-the-art results in out-of-domain evaluations, e.g., on WikiCoref, our system, which is trained on CoNLL, achieves on-par performance with a system designed for this dataset.

pdf abs
A Neural Local Coherence Model for Text Quality Assessment
Mohsen Mesgar | Michael Strube
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a local coherence model that captures the flow of what semantically connects adjacent sentences in a text. We represent the semantics of a sentence by a vector and capture its state at each word of the sentence. We model what relates two adjacent sentences based on the two most similar semantic states, each of which is in one of the sentences. We encode the perceived coherence of a text by a vector, which represents patterns of changes in salient information that relates adjacent sentences. Our experiments demonstrate that our approach is beneficial for two downstream tasks: Readability assessment, in which our model achieves new state-of-the-art results; and essay scoring, in which the combination of our coherence vectors and other task-dependent features significantly improves the performance of a strong essay scorer.

pdf
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Benjamin Heinzerling | Michael Strube
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Unrestricted Bridging Resolution
Yufang Hou | Katja Markert | Michael Strube
Computational Linguistics, Volume 44, Issue 2 - June 2018

In contrast to identity anaphors, which indicate coreference between a noun phrase and its antecedent, bridging anaphors link to their antecedent(s) via lexico-semantic, frame, or encyclopedic relations. Bridging resolution involves recognizing bridging anaphors and finding links to antecedents. In contrast to most prior work, we tackle both problems. Our work also follows a more wide-ranging definition of bridging than most previous work and does not impose any restrictions on the type of bridging anaphora or relations between anaphor and antecedent. We create a corpus (ISNotes) annotated for information status (IS), bridging being one of the IS subcategories. The annotations reach high reliability for all categories and marginal reliability for the bridging subcategory. We use a two-stage statistical global inference method for bridging resolution. Given all mentions in a document, the first stage, bridging anaphora recognition, recognizes bridging anaphors as a subtask of learning fine-grained IS. We use a cascading collective classification method where (i) collective classification allows us to investigate relations among several mentions and autocorrelation among IS classes and (ii) cascaded classification allows us to tackle class imbalance, important for minority classes such as bridging. We show that our method outperforms current methods both for IS recognition overall as well as for bridging, specifically. The second stage, bridging antecedent selection, finds the antecedents for all predicted bridging anaphors. We investigate the phenomenon of semantically or syntactically related bridging anaphors that share the same antecedent, a phenomenon we call sibling anaphors. We show that taking sibling anaphors into account in a joint inference model improves antecedent selection performance. In addition, we develop semantic and salience features for antecedent selection and suggest a novel method to build the candidate antecedent list for an anaphor, using the discourse scope of the anaphor. Our model outperforms previous work significantly.

pdf abs
Transparent, Efficient, and Robust Word Embedding Access with WOMBAT
Mark-Christoph Müller | Michael Strube
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

2017

pdf abs
Event Argument Identification on Dependency Graphs with Bidirectional LSTMs
Alex Judea | Michael Strube
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper we investigate the performance of event argument identification. We show that the performance is tied to syntactic complexity. Based on this finding, we propose a novel and effective system for event argument identification. Recurrent Neural Networks learn to produce meaningful representations of long and short dependency paths. Convolutional Neural Networks learn to decompose the lexical context of argument candidates. They are combined into a simple system which outperforms a feature-based, state-of-the-art event argument identifier without any manual feature engineering.

pdf bib
Proceedings of the IJCNLP 2017, Tutorial Abstracts
Sadao Kurohashi | Michael Strube
Proceedings of the IJCNLP 2017, Tutorial Abstracts

pdf abs
Revisiting Selectional Preferences for Coreference Resolution
Benjamin Heinzerling | Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Selectional preferences have long been claimed to be essential for coreference resolution. However, they are modeled only implicitly by current coreference resolvers. We propose a dependency-based embedding model of selectional preferences which allows fine-grained compatibility judgments with high coverage. Incorporating our model improves performance, matching state-of-the-art results of a more complex system. However, it comes with a cost that makes it debatable how worthwhile are such improvements.

pdf abs
Trust, but Verify! Better Entity Linking through Automatic Verification
Benjamin Heinzerling | Michael Strube | Chin-Yew Lin
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We introduce automatic verification as a post-processing step for entity linking (EL). The proposed method trusts EL system results collectively, by assuming entity mentions are mostly linked correctly, in order to create a semantic profile of the given text using geospatial and temporal information, as well as fine-grained entity types. This profile is then used to automatically verify each linked mention individually, i.e., to predict whether it has been linked correctly or not. Verification allows leveraging a rich set of global and pairwise features that would be prohibitively expensive for EL systems employing global inference. Evaluation shows consistent improvements across datasets and systems. In particular, when applied to state-of-the-art systems, our method yields an absolute improvement in linking performance of up to 1.7 F1 on AIDA/CoNLL’03 and up to 2.4 F1 on the English TAC KBP 2015 TEDL dataset.

pdf abs
Lexical Features in Coreference Resolution: To be Used With Caution
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Lexical features are a major source of information in state-of-the-art coreference resolvers. Lexical features implicitly model some of the linguistic phenomena at a fine granularity level. They are especially useful for representing the context of mentions. In this paper we investigate a drawback of using many lexical features in state-of-the-art coreference resolvers. We show that if coreference resolvers mainly rely on lexical features, they can hardly generalize to unseen domains. Furthermore, we show that the current coreference resolution evaluation is clearly flawed by only evaluating on a specific split of a specific dataset in which there is a notable overlap between the training, development and test sets.

pdf bib abs
Use Generalized Representations, But Do Not Forget Surface Features
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

Only a year ago, all state-of-the-art coreference resolvers were using an extensive amount of surface features. Recently, there was a paradigm shift towards using word embeddings and deep neural networks, where the use of surface features is very limited. In this paper, we show that a simple SVM model with surface features outperforms more complex neural models for detecting anaphoric mentions. Our analysis suggests that using generalized representations and surface features have different strength that should be both taken into account for improving coreference resolution.

pdf bib
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing
Dirk Hovy | Shannon Spruit | Margaret Mitchell | Emily M. Bender | Michael Strube | Hanna Wallach
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

pdf abs
Using a Graph-based Coherence Model in Document-Level Machine Translation
Leo Born | Mohsen Mesgar | Michael Strube
Proceedings of the Third Workshop on Discourse in Machine Translation

Although coherence is an important aspect of any text generation system, it has received little attention in the context of machine translation (MT) so far. We hypothesize that the quality of document-level translation can be improved if MT models take into account the semantic relations among sentences during translation. We integrate the graph-based coherence model proposed by Mesgar and Strube, (2016) with Docent (Hardmeier et al., 2012, Hardmeier, 2014) a document-level machine translation system. The application of this graph-based coherence modeling approach is novel in the context of machine translation. We evaluate the coherence model and its effects on the quality of the machine translation. The result of our experiments shows that our coherence model slightly improves the quality of translation in terms of the average Meteor score.

2016

pdf
Feature-Rich Error Detection in Scientific Writing Using Logistic Regression
Madeline Remse | Mohsen Mesgar | Michael Strube
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf abs
Microblog Emotion Classification by Computing Similarity in Text, Time, and Space
Anja Summa | Bernd Resch | Michael Strube
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Most work in NLP analysing microblogs focuses on textual content thus neglecting temporal and spatial information. We present a new interdisciplinary method for emotion classification that combines linguistic, temporal, and spatial information into a single metric. We create a graph of labeled and unlabeled tweets that encodes the relations between neighboring tweets with respect to their emotion labels. Graph-based semi-supervised learning labels all tweets with an emotion.

pdf
Generating Coherent Summaries of Scientific Articles Using Coherence Patterns
Daraksha Parveen | Mohsen Mesgar | Michael Strube
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf abs
Incremental Global Event Extraction
Alex Judea | Michael Strube
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Event extraction is a difficult information extraction task. Li et al. (2014) explore the benefits of modeling event extraction and two related tasks, entity mention and relation extraction, jointly. This joint system achieves state-of-the-art performance in all tasks. However, as a system operating only at the sentence level, it misses valuable information from other parts of the document. In this paper, we present an incremental easy-first approach to make the global context of the entire document available to the intra-sentential, state-of-the-art event extractor. We show that our method robustly increases performance on two datasets, namely ACE 2005 and TAC 2015.

pdf
Search Space Pruning: A Simple Solution for Better Coreference Resolvers
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Lexical Coherence Graph Modeling Using Word Embeddings
Mohsen Mesgar | Michael Strube
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf abs
Latent Structures for Coreference Resolution
Sebastian Martschat | Michael Strube
Transactions of the Association for Computational Linguistics, Volume 3

Machine learning approaches to coreference resolution vary greatly in the modeling of the problem: while early approaches operated on the mention pair level, current research focuses on ranking architectures and antecedent trees. We propose a unified representation of different approaches to coreference resolution in terms of the structure they operate on. We represent several coreference resolution approaches proposed in the literature in our framework and evaluate their performance. Finally, we conduct a systematic analysis of the output of these approaches, highlighting differences and similarities.

pdf
Event Extraction as Frame-Semantic Parsing
Alex Judea | Michael Strube
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
Graph-based Coherence Modeling For Assessing Readability
Mohsen Mesgar | Michael Strube
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
Topical Coherence for Graph-based Extractive Summarization
Daraksha Parveen | Hans-Martin Ramsl | Michael Strube
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Chengqing Zong | Michael Strube
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Chengqing Zong | Michael Strube
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Visual Error Analysis for Entity Linking
Benjamin Heinzerling | Michael Strube
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf
Plug Latent Structures and Play Coreference Resolution
Sebastian Martschat | Patrick Claus | Michael Strube
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
Analyzing and Visualizing Coreference Resolution Errors
Sebastian Martschat | Thierry Göckel | Michael Strube
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

This paper describes the derivation of distributional semantic representations for open class words relative to a concept inventory, and of concepts relative to open class words through grammatical relations extracted from Wikipedia articles. The concept inventory comes from WikiNet, a large-scale concept network derived from Wikipedia. The distinctive feature of these representations are their relation to a concept network, through which we can compute selectional preferences of open-class words relative to general concepts. The resource thus derived provides a meaning representation that complements the relational representation captured in the concept network. It covers English open-class words, but the concept base is language independent. The resource can be extended to other languages, with the use of language specific dependency parsers. Good results in metonymy resolution show the resource's potential use for NLP applications.

2011

pdf
Unrestricted Coreference Resolution via Global Hypergraph Partitioning
Jie Cai | Éva Mújdricza-Maydt | Michael Strube
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf
Fine-Grained Sentiment Analysis with Structural Features
Cäcilia Zirn | Mathias Niepert | Heiner Stuckenschmidt | Michael Strube
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
WikiNetTK – A Tool Kit for EmbeddingWorld Knowledge in NLP Applications
Alex Judea | Vivi Nastase | Michael Strube
Proceedings of the IJCNLP 2011 System Demonstrations

2010

pdf
Evaluation Metrics For End-to-End Coreference Resolution Systems
Jie Cai | Michael Strube
Proceedings of the SIGDIAL 2010 Conference

pdf
End-to-End Coreference Resolution via Hypergraph Partitioning
Jie Cai | Michael Strube
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf abs
WikiNet: A Very Large Scale Multi-Lingual Concept Network
Vivi Nastase | Michael Strube | Benjamin Boerschinger | Caecilia Zirn | Anas Elghafari
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a multi-lingual large-scale concept network obtained automatically by mining for concepts and relations and exploiting a variety of sources of knowledge from Wikipedia. Concepts and their lexicalizations are extracted from Wikipedia pages, in particular from article titles, hyperlinks, disambiguation pages and cross-language links. Relations are extracted from the category and page network, from the category names, from infoboxes and the body of the articles. The resulting network has two main components: (i) a central, language independent index of concepts, which serves to keep track of the concepts' lexicalizations both within a language and across languages, and to separate linguistic expressions of concepts from the relations in which they are involved (concepts themselves are represented as numeric IDs); (ii) a large network built on the basis of the relations extracted, represented as relations between concepts (more specifically, the numeric IDs). The various stages of obtaining the network were separately evaluated, and the results show a qualitative resource.

2009

pdf
Tree Linearization in English: Improving Language Model Based Approaches
Katja Filippova | Michael Strube
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
Extracting World and Linguistic Knowledge from Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

pdf
Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features
Viola Ganter | Michael Strube
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Creating an Annotated Corpus for Generating Walking Directions
Stephanie Schuldes | Michael Roth | Anette Frank | Michael Strube
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

pdf
Combining Collocations, Lexical and Encyclopedic Knowledge for Metonymy Resolution
Vivi Nastase | Michael Strube
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
Sentence Fusion via Dependency Graph Compression
Katja Filippova | Michael Strube
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf
Dependency Tree Based Sentence Compression
Katja Filippova | Michael Strube
Proceedings of the Fifth International Natural Language Generation Conference

pdf abs
Knowledge Sources for Bridging Resolution in Multi-Party Dialog
Mark-Christoph Mueller | Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedures performance.

pdf abs
A Three-stage Disfluency Classifier for Multi Party Dialogues
Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present work on a three-stage system to detect and classify disfluencies in multi party dialogues. The system consists of a regular expression based module and two machine learning based modules. The results are compared to other work on multi party dialogues and we show that our system outperforms previously reported ones.

pdf abs
Acquiring a Taxonomy from the German Wikipedia
Laura Kassner | Vivi Nastase | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the process of acquiring a large, domain independent, taxonomy from the German Wikipedia. We build upon a previously implemented platform that extracts a semantic network and taxonomy from the English version of the Wikipedia. We describe two accomplishments of our work: the semantic network for the German language in which isa links are identified and annotated, and an expansion of the platform for easy adaptation for a new language. We identify the platforms strengths and shortcomings, which stem from the scarcity of free processing resources for languages other than English. We show that the taxonomy induction process is highly reliable - evaluated against the German version of WordNet, GermaNet, the resource obtained shows an accuracy of 83.34%.

pdf abs
Parameters for Topic Boundary Detection in Multi-Party Dialogues
Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a topic boundary detection method that searches for connections between sequences of utterances in multi party dialogues. The connections are established based on word identity. We compare our method to a state-of-the art automatic Topic boundary detection method that was also used on multi party dialogues. We checked various methods of preprocessing of the data, including stemming, lemmatization and stopword filtering with a text-based as well as speech-based stopword lists. Using standard evaluation methods we found that our method outperformed the state-of-the art method.

2007

pdf
Extending the Entity-grid Coherence Model to Semantically Related Entities
Katja Filippova | Michael Strube
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

pdf
Generating Constituent Order in German Clauses
Katja Filippova | Michael Strube
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf
An API for Measuring the Relatedness of Words in Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
Semantic Role Labeling for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Demonstrations

pdf
Using linguistically motivated features for paragraph boundary identification
Katja Filippova | Michael Strube
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf abs
Part-of-Speech Tagging of Transcribed Speech
Margot Mieskes | Michael Strube
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We used four Part-of-Speech taggers, which are available for research purposes and were originally trained on text to tag a corpus of transcribed multiparty spoken dialogues. The assigned tags were then manually corrected. The correction was first used to evaluate the four taggers, then to retrain them. Despite limited resources in time, money and annotators we reached results comparable to those reported for the taggers on text. Based on our experience we present guidelines to produce reliably POS tagged corpora of new domains.

pdf
Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference