David Steinberger

2017

pdf abs
Czech Dataset for Semantic Similarity and Relatedness
Miloslav Konopík | Ondřej Pražák | David Steinberger
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper introduces a Czech dataset for semantic similarity and semantic relatedness. The dataset contains word pairs with hand annotated scores that indicate the semantic similarity and semantic relatedness of the words. The dataset contains 953 word pairs compiled from 9 different sources. It contains words and their contexts taken from real text corpora including extra examples when the words are ambiguous. The dataset is annotated by 5 independent annotators. The average Spearman correlation coefficient of the annotation agreement is r = 0.81. We provide reference evaluation experiments with several methods for computing semantic similarity and relatedness.

2016

pdf
UWB at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity with Distributional Semantics for Chunks
Miloslav Konopík | Ondřej Pražák | David Steinberger | Tomáš Brychcín
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)