Simon Ostermann


Investigating the Encoding of Words in BERT’s Neurons Using Feature Textualization
Tanja Baeumel | Soniya Vijayakumar | Josef van Genabith | Guenter Neumann | Simon Ostermann
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. A contrast is in computer vision, where feature visualization provides a decompositional interpretability technique for neurons of vision models. Activation maximization is used to synthesize inherently interpretable visual representations of the information encoded in individual neurons. Our work is inspired by this but presents a cautionary tale on the interpretability of single neurons, based on the first large-scale attempt to adapt activation maximization to NLP, and, more specifically, large PLMs. We propose feature textualization, a technique to produce dense representations of neurons in the PLM word embedding space. We apply feature textualization to the BERT model to investigate whether the knowledge encoded in individual neurons can be interpreted and symbolized. We find that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clear-cut symbolic units of language such as words. Additionally, we use feature textualization to investigate how many neurons are needed to encode words in BERT.

Find-2-Find: Multitask Learning for Anaphora Resolution and Object Localization
Cennet Oguz | Pascal Denis | Emmanuel Vincent | Simon Ostermann | Josef van Genabith
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In multimodal understanding tasks, visual and linguistic ambiguities can arise. Visual ambiguity can occur when visual objects require a model to ground a referring expression in a video without strong supervision, while linguistic ambiguity can occur from changes in entities in action flows. As an example from the cooking domain, “oil” mixed with “salt” and “pepper” could later be referred to as a “mixture”. Without a clear visual-linguistic alignment, we cannot know which among several objects shown is referred to by the language expression “mixture”, and without resolved antecedents, we cannot pinpoint what the mixture is. We define this chicken-and-egg problem as Visual-linguistic Ambiguity. In this paper, we present Find2Find, a joint anaphora resolution and object localization dataset targeting the problem of visual-linguistic ambiguity, consisting of 500 anaphora-annotated recipes with corresponding videos. We present experimental results of a novel end-to-end joint multitask learning framework for Find2Find that fuses visual and textual information and shows improvements both for anaphora resolution and object localization with one joint model in multitask learning, as compared to a strong single-task baseline.


pdf bib
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing
Simon Ostermann | Sheng Zhang | Michael Roth | Peter Clark
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

Commonsense Inference in Natural Language Processing (COIN) - Shared Task Report
Simon Ostermann | Sheng Zhang | Michael Roth | Peter Clark
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

This paper reports on the results of the shared tasks of the COIN workshop at EMNLP-IJCNLP 2019. The tasks consisted of two machine comprehension evaluations, each of which tested a system’s ability to answer questions/queries about a text. Both evaluations were designed such that systems need to exploit commonsense knowledge, for example, in the form of inferences over information that is available in the common ground but not necessarily mentioned in the text. A total of five participating teams submitted systems for the shared tasks, with the best submitted system achieving 90.6% accuracy and 83.7% F1-score on task 1 and task 2, respectively.

MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants
Simon Ostermann | Michael Roth | Manfred Pinkal
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions cannot be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at


Mapping Texts to Scripts: An Entailment Study
Simon Ostermann | Hannah Seitz | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
Simon Ostermann | Ashutosh Modi | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge
Simon Ostermann | Michael Roth | Ashutosh Modi | Stefan Thater | Manfred Pinkal
Proceedings of the 12th International Workshop on Semantic Evaluation

This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.


Aligning Script Events with Narrative Texts
Simon Ostermann | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.


InScript: Narrative texts annotated with script information
Ashutosh Modi | Tatjana Anikina | Simon Ostermann | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.


Annotating Entailment Relations for Shortanswer Questions
Simon Ostermann | Andrea Horbach | Manfred Pinkal
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications


Paraphrase Detection for Short Answer Scoring
Nikolina Koleva | Andrea Horbach | Alexis Palmer | Simon Ostermann | Manfred Pinkal
Proceedings of the third workshop on NLP for computer-assisted language learning