Crowdsourced Corpus with Entity Salience Annotations
Milan Dojchinovski, Dinesh Reddy, Tomáš Kliegr, Tomáš Vitvar, Harald Sack
Abstract
In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.- Anthology ID:
- L16-1527
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3307–3311
- Language:
- URL:
- https://aclanthology.org/L16-1527
- DOI:
- Cite (ACL):
- Milan Dojchinovski, Dinesh Reddy, Tomáš Kliegr, Tomáš Vitvar, and Harald Sack. 2016. Crowdsourced Corpus with Entity Salience Annotations. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3307–3311, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Crowdsourced Corpus with Entity Salience Annotations (Dojchinovski et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/L16-1527.pdf