Reconstructing NER Corpora: a Case Study on Bulgarian
Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, Alexander Popov
Abstract
The paper reports on the usage of deep learning methods for improving a Named Entity Recognition (NER) training corpus and for predicting and annotating new types in a test corpus. We show how the annotations in a type-based corpus of named entities (NE) were populated as occurrences within it, thus ensuring density of the training information. A deep learning model was adopted for discovering inconsistencies in the initial annotation and for learning new NE types. The evaluation results get improved after data curation, randomization and deduplication.- Anthology ID:
- 2020.lrec-1.571
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4647–4652
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.571
- DOI:
- Cite (ACL):
- Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, and Alexander Popov. 2020. Reconstructing NER Corpora: a Case Study on Bulgarian. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4647–4652, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Reconstructing NER Corpora: a Case Study on Bulgarian (Marinova et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.571.pdf