Nefnir: A high accuracy lemmatizer for Icelandic

Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, Kristín Bjarnadóttir


Abstract
Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.
Anthology ID:
W19-6133
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
310–315
Language:
URL:
https://aclanthology.org/W19-6133
DOI:
Bibkey:
Cite (ACL):
Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, and Kristín Bjarnadóttir. 2019. Nefnir: A high accuracy lemmatizer for Icelandic. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 310–315, Turku, Finland. Linköping University Electronic Press.
Cite (Informal):
Nefnir: A high accuracy lemmatizer for Icelandic (Ingólfsdóttir et al., NoDaLiDa 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W19-6133.pdf