Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign
Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard
Abstract
Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a corpus composed of press archives, OCRed from French newspapers of December 1890. We present the methodology we used to produce the corpus and the characteristics of the corpus in terms of named entities annotation. This annotated corpus has been used in an evaluation campaign. We present this evaluation, the metrics we used and the results obtained by the participants.- Anthology ID:
- L12-1166
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3126–3131
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/343_Paper.pdf
- DOI:
- Cite (ACL):
- Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, and Ludovic Quintard. 2012. Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3126–3131, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign (Galibert et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/343_Paper.pdf