Abstract
Motivation: Gold Standards for named entities are, ironically, not standard themselves. Some specify the one perfect annotation. Others specify perfectly good alternatives. The concept of Silver standard is relatively new. The objective is consensus rather than perfection. How should the two concepts be best represented and related? Approach: We examine several Biomedical Gold Standards and motivate a new representational format, centroids, which simply and effectively represents name distributions. We define an algorithm for finding centroids, given a set of alternative input annotations and we test the outputs quantitatively and qualitatively. We also define a metric of relatively acceptability on top of the centroid standard. Results: Precision, recall and F-scores of over 0.99 are achieved for the simple sanity check of giving the algorithm Gold Standard inputs. Qualitative analysis of the differences very often reveals errors and incompleteness in the original Gold Standard. Given automatically generated annotations, the centroids effectively represent the range of those contributions and the quality of the centroid annotations is highly competitive with the best of the contributors. Conclusion: Centroids cleanly represent alternative name variations for Silver and Gold Standards. A centroid Silver Standard is derived just like a Gold Standard, only from imperfect inputs.- Anthology ID:
- L12-1364
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3894–3900
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/633_Paper.pdf
- DOI:
- Cite (ACL):
- Ian Lewin, Şenay Kafkas, and Dietrich Rebholz-Schuhmann. 2012. Centroids: Gold standards with distributional variation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3894–3900, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Centroids: Gold standards with distributional variation (Lewin et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/633_Paper.pdf