Steffen Staab


Is Language Modeling Enough? Evaluating Effective Embedding Combinations
Rudolf Schneider | Tom Oberhauser | Paul Grundmann | Felix Alexander Gers | Alexander Loeser | Steffen Staab
Proceedings of the Twelfth Language Resources and Evaluation Conference

Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We conduct an in-depth evaluation of nine well known natural language understanding tasks with SentEval. Also, we extend SentEval with two additional tasks to the medical domain. We present PubMedSection, a novel topic classification dataset focussed on the biomedical domain. Our comprehensive analysis covers 11 tasks and combinations of six embeddings. We report that combined embeddings outperform state of the art universal embeddings without any embedding fine-tuning. We observe that adding topic model based embeddings helps for most tasks and that differing pre-training tasks encode complementary features. Moreover, we present new state of the art results on the MPQA and SUBJ tasks in SentEval.


CLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors
Ipek Baris | Lukas Schmelzeisen | Steffen Staab
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our submission to SemEval-2019 Task 7: RumourEval: Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the veracity of a given rumor. For subtask A, we implement a CNN-based neural architecture using ELMo embeddings of post text combined with auxiliary features and achieve a F1-score of 44.6%. For subtask B, we employ a MLP neural network leveraging our estimates for subtask A and achieve a F1-score of 30.1% (second place in the competition). We provide results and analysis of our system performance and present ablation experiments.


A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing
Rene Pickhardt | Thomas Gottron | Martin Körner | Paul Georg Wagner | Till Speicher | Steffen Staab
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


Clustering Concept Hierarchies from Text
Philipp Cimiano | Andreas Hotho | Steffen Staab
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Feature Weighting for Co-occurrence-based Classification of Words
Viktor Pekar | Michael Krkoska | Steffen Staab
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics


Word classification based on combined measures of distributional and semantic similarity
Viktor Pekar | Steffen Staab
10th Conference of the European Chapter of the Association for Computational Linguistics


Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision
Viktor Pekar | Steffen Staab
COLING 2002: The 19th International Conference on Computational Linguistics


pdf bib
Knowledge Portals (invited talk)
Steffen Staab
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management


From Manual to Semi-Automatic Semantic Annotation: About Ontology-Based Text Annotation Tools
Michael Erdmann | Alexander Maedche | Hans-Peter Schnurr | Steffen Staab
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content