Matthias Lindemann

2023

pdf abs
Compositional Generalization without Trees using Multiset Tagging and Latent Permutations
Matthias Lindemann | Alexander Koller | Ivan Titov
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Seq2seq models have been shown to struggle with compositional generalization in semantic parsing, i.e. generalizing to unseen compositions of phenomena that the model handles correctly in isolation.We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. We formulate predicting a permutation as solving a regularized linear program and we backpropagate through the solver. In contrast to prior work, our approach does not place a priori restrictions on possible permutations, making it very expressive.Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks that require generalization to longer examples. We also outperform non-tree-based models on structural generalization on the COGS benchmark.For the first time, we show that a model without an inductive bias provided by trees achieves high accuracy on generalization to deeper recursion depth.

pdf abs
Compositional Generalisation with Structured Reordering and Fertility Layers
Matthias Lindemann | Alexander Koller | Ivan Titov
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Seq2seq models have been shown to struggle with compositional generalisation, i.e. generalising to new and potentially more complex structures than seen during training. Taking inspiration from grammar-based models that excel at compositional generalisation, we present a flexible end-to-end differentiable neural model that composes two structural operations: a fertility step, which we introduce in this work, and a reordering step based on previous work (Wang et al., 2021). To ensure differentiability, we use the expected value of each step, which we compute using dynamic programming. Our model outperforms seq2seq models by a wide margin on challenging compositional splits of realistic semantic parsing tasks that require generalisation to longer examples. It also compares favourably to other models targeting compositional generalisation.

pdf bib
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Elisa Bassignana | Matthias Lindemann | Alban Petit
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

2022

pdf abs
Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization
Amr Keleg | Matthias Lindemann | Danyang Liu | Wanqiu Long | Bonnie L. Webber
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

Recent improvements in automatic news summarization fundamentally rely on large corpora of news articles and their summaries. These corpora are often constructed by scraping news websites, which results in including not only summaries but also other kinds of texts. Apart from more generic noise, we identify straplines as a form of text scraped from news websites that commonly turn out not to be summaries. The presence of these non-summaries threatens the validity of scraped corpora as benchmarks for news summarization. We have annotated extracts from two news sources that form part of the Newsroom corpus (Grusky et al., 2018), labeling those which were straplines, those which were summaries, and those which were both. We present a rule-based strapline detection method that achieves good performance on a manually annotated test set. Automatic evaluation indicates that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.

2020

pdf abs
Normalizing Compositional Structures Across Graphbanks
Lucia Donatelli | Jonas Groschwitz | Matthias Lindemann | Alexander Koller | Pia Weißenhorn
Proceedings of the 28th International Conference on Computational Linguistics

The emergence of a variety of graph-based meaning representations (MRs) has sparked an important conversation about how to adequately represent semantic structure. MRs exhibit structural differences that reflect different theoretical and design considerations, presenting challenges to uniform linguistic analysis and cross-framework semantic parsing. Here, we ask the question of which design differences between MRs are meaningful and semantically-rooted, and which are superficial. We present a methodology for normalizing discrepancies between MRs at the compositional level (Lindemann et al., 2019), finding that we can normalize the majority of divergent phenomena using linguistically-grounded rules. Our work significantly increases the match in compositional structure between MRs and improves multi-task learning (MTL) in a low-resource setting, serving as a proof of concept for future broad-scale cross-MR normalization.

pdf abs
Fast semantic parsing with well-typedness guarantees
Matthias Lindemann | Jonas Groschwitz | Alexander Koller
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

AM dependency parsing is a linguistically principled method for neural semantic parsing with high accuracy across multiple graphbanks. It relies on a type system that models semantic valency but makes existing parsers slow. We describe an A* parser and a transition-based parser for AM dependency parsing which guarantee well-typedness and improve parsing speed by up to 3 orders of magnitude, while maintaining or improving accuracy.

2019

pdf abs
Saarland at MRP 2019: Compositional parsing across all graphbanks
Lucia Donatelli | Meaghan Fowlie | Jonas Groschwitz | Alexander Koller | Matthias Lindemann | Mario Mina | Pia Weißenhorn
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

We describe the Saarland University submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference on Computational Natural Language Learning (CoNLL).

pdf abs
Compositional Semantic Parsing across Graphbanks
Matthias Lindemann | Jonas Groschwitz | Alexander Koller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Most semantic parsers that map sentences to graph-based meaning representations are hand-designed for specific graphbanks. We present a compositional neural semantic parser which achieves, for the first time, competitive accuracies across a diverse range of graphbanks. Incorporating BERT embeddings and multi-task learning improves the accuracy further, setting new states of the art on DM, PAS, PSD, AMR 2015 and EDS.

pdf abs
Verb-Second Effect on Quantifier Scope Interpretation
Asad Sayeed | Matthias Lindemann | Vera Demberg
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Sentences like “Every child climbed a tree” have at least two interpretations depending on the precedence order of the universal quantifier and the indefinite. Previous experimental work explores the role that different mechanisms such as semantic reanalysis and world knowledge may have in enabling each interpretation. This paper discusses a web-based task that uses the verb-second characteristic of German main clauses to estimate the influence of word order variation over world knowledge.

2018

pdf abs
AMR dependency parsing with a typed semantic algebra
Jonas Groschwitz | Matthias Lindemann | Meaghan Fowlie | Mark Johnson | Alexander Koller
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph. This allows us to use standard neural techniques for supertagging and dependency tree parsing, constrained by a linguistically principled type system. We present two approximative decoding algorithms, which achieve state-of-the-art accuracy and outperform strong baselines.