Tomáš Vitvar
2016
Crowdsourced Corpus with Entity Salience Annotations
Milan Dojchinovski
|
Dinesh Reddy
|
Tomáš Kliegr
|
Tomáš Vitvar
|
Harald Sack
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.