GerNED: A German Corpus for Named Entity Disambiguation
Danuta Ploch, Leonhard Hennig, Angelina Duka, Ernesto William De Luca, Sahin Albayrak
Abstract
Determining the real-world referents for name mentions of persons, organizations and other named entities in texts has become an important task in many information retrieval scenarios and is referred to as Named Entity Disambiguation (NED). While comprehensive datasets support the development and evaluation of NED approaches for English, there are no public datasets to assess NED systems for other languages, such as German. This paper describes the construction of an NED dataset based on a large corpus of German news articles. The dataset is closely modeled on the datasets used for the Knowledge Base Population tasks of the Text Analysis Conference, and contains gold standard annotations for the NED tasks of Entity Linking, NIL Detection and NIL Clustering. We also present first experimental results on the new dataset for each of these tasks in order to establish a baseline for future research efforts.- Anthology ID:
- L12-1078
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3886–3893
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/222_Paper.pdf
- DOI:
- Cite (ACL):
- Danuta Ploch, Leonhard Hennig, Angelina Duka, Ernesto William De Luca, and Sahin Albayrak. 2012. GerNED: A German Corpus for Named Entity Disambiguation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3886–3893, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- GerNED: A German Corpus for Named Entity Disambiguation (Ploch et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/222_Paper.pdf