Nefnir: A high accuracy lemmatizer for Icelandic
Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, Kristín Bjarnadóttir
Abstract
Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.- Anthology ID:
- W19-6133
- Volume:
- Proceedings of the 22nd Nordic Conference on Computational Linguistics
- Month:
- September–October
- Year:
- 2019
- Address:
- Turku, Finland
- Editors:
- Mareike Hartmann, Barbara Plank
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- Linköping University Electronic Press
- Note:
- Pages:
- 310–315
- Language:
- URL:
- https://aclanthology.org/W19-6133
- DOI:
- Cite (ACL):
- Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, and Kristín Bjarnadóttir. 2019. Nefnir: A high accuracy lemmatizer for Icelandic. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 310–315, Turku, Finland. Linköping University Electronic Press.
- Cite (Informal):
- Nefnir: A high accuracy lemmatizer for Icelandic (Ingólfsdóttir et al., NoDaLiDa 2019)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/W19-6133.pdf