Towards High Accuracy Named Entity Recognition for Icelandic
Svanhvít Lilja Ingólfsdóttir, Sigurjón Þorsteinsson, Hrafn Loftsson
Abstract
We report on work in progress which consists of annotating an Icelandic corpus for named entities (NEs) and using it for training a named entity recognizer based on a Bidirectional Long Short-Term Memory model. Currently, we have annotated 7,538 NEs appearing in the first 200,000 tokens of a 1 million token corpus, MIM-GOLD, originally developed for serving as a gold standard for part-of-speech tagging. Our best performing model, trained on this subset of MIM-GOLD, and enriched with external word embeddings, obtains an overall F1 score of 81.3% when categorizing NEs into the following four categories: persons, locations, organizations and miscellaneous. Our preliminary results are promising, especially given the fact that 80% of MIM-GOLD has not yet been used for training.- Anthology ID:
- W19-6142
- Volume:
- Proceedings of the 22nd Nordic Conference on Computational Linguistics
- Month:
- September–October
- Year:
- 2019
- Address:
- Turku, Finland
- Editors:
- Mareike Hartmann, Barbara Plank
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- Linköping University Electronic Press
- Note:
- Pages:
- 363–369
- Language:
- URL:
- https://aclanthology.org/W19-6142
- DOI:
- Cite (ACL):
- Svanhvít Lilja Ingólfsdóttir, Sigurjón Þorsteinsson, and Hrafn Loftsson. 2019. Towards High Accuracy Named Entity Recognition for Icelandic. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 363–369, Turku, Finland. Linköping University Electronic Press.
- Cite (Informal):
- Towards High Accuracy Named Entity Recognition for Icelandic (Ingólfsdóttir et al., NoDaLiDa 2019)
- PDF:
- https://preview.aclanthology.org/landing_page/W19-6142.pdf