Matthias Irmer


2025

pdf bib
Can information theory unravel the subtext in a Chekhovian short story?
J. Nathanael Philipp | Olav Mueller-Reichau | Matthias Irmer | Michael Richter | Max Kölbl
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)

In this study, we investigate whether information-theoretic measures such as surprisal can quantify the elusive notion of subtext in a Chekhovian short story. Specifically, we conduct a series of experiments for which we enrich the original text once with (different types of) meaningful glosses and once with fake glosses. For the different texts thus created, we calculate the surprisal values using two methods: using either a bag-of-words model or a large language model. We observe enrichment effects depending on the method, but no interpretable subtext effect.

2014

pdf bib
Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts
Antje Schlaf | Claudia Bobach | Matthias Irmer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the creation of a gold standard for chemistry-disease relations in patent texts. We start with an automated annotation of named entities of the domains chemistry (e.g. “propranolol”) and diseases (e.g. “hypertension”) as well as of related domains like methods and substances. After that, domain-relevant relations between these entities, e.g. “propranolol treats hypertension”, have been manually annotated. The corpus is intended to be suitable for developing and evaluating relation extraction methods. In addition, we present two reasoning methods of high precision for automatically extending the set of extracted relations. Chain reasoning provides a method to infer and integrate additional, indirectly expressed relations occurring in relation chains. Enumeration reasoning exploits the frequent occurrence of enumerations in patents and automatically derives additional relations. These two methods are applicable both for verifying and extending the manually annotated data as well as for potential improvements of automatic relation extraction.