Abstract
To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. Name projections are then curated, again through crowdsourcing. This technique resulted in the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.- Anthology ID:
- L12-1382
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3106–3110
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/655_Paper.pdf
- DOI:
- Cite (ACL):
- Dawn Lawrie, James Mayfield, Paul McNamee, and Douglas Oard. 2012. Creating and Curating a Cross-Language Person-Entity Linking Collection. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3106–3110, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Creating and Curating a Cross-Language Person-Entity Linking Collection (Lawrie et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/655_Paper.pdf