J. Nathanael Philipp
2026
Semantic Information: A Difference That Makes a Difference
J. Nathanael Philipp | Max Kölbl | Michael Richter
Proceedings of the Fifteenth Language Resources and Evaluation Conference
J. Nathanael Philipp | Max Kölbl | Michael Richter
Proceedings of the Fifteenth Language Resources and Evaluation Conference
In the framework of distributional semantics, we introduce a novel notion and operationalisation of semantic information for natural language. The key idea is as follows: a linguistic sign carries semantic information about a document if it reduces the amount of surprisal for a language processor. We consider two systems, an informed one and an uninformed one, and describe semantic information in their terms. Processing effort is quantified via surprisal where the informed system is ‘aware’ of the linguistic sign and the uninformed one is not. On an English fairy tale corpus and on two German news corpora, we tested successfully the prediction that if the linguistic sign in question carries pre-information through semantic surprisal, the current level of surprisal for the language processor is reduced. The conclusion is that the degree of semantic information results from the degree of semantic prior information.
2025
Surprisal in Action: A Comparative Study of LDA and LSA for Keyword Extraction
J. Nathanael Philipp | Max Kölbl | Michael Richter
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers
J. Nathanael Philipp | Max Kölbl | Michael Richter
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers
Can information theory unravel the subtext in a Chekhovian short story?
J. Nathanael Philipp | Olav Mueller-Reichau | Matthias Irmer | Michael Richter | Max Kölbl
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
J. Nathanael Philipp | Olav Mueller-Reichau | Matthias Irmer | Michael Richter | Max Kölbl
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
In this study, we investigate whether information-theoretic measures such as surprisal can quantify the elusive notion of subtext in a Chekhovian short story. Specifically, we conduct a series of experiments for which we enrich the original text once with (different types of) meaningful glosses and once with fake glosses. For the different texts thus created, we calculate the surprisal values using two methods: using either a bag-of-words model or a large language model. We observe enrichment effects depending on the method, but no interpretable subtext effect.