Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish

Michał Marcińczuk


Abstract
In the paper we present a tool for lemmatization of multi-word common noun phrases and named entities for Polish called LemmaPL. The tool is based on a set of manually crafted rules and heuristics utilizing a set of dictionaries (including morphological, named entities and inflection patterns). The accuracy of lemmatization obtained by the tool reached 97.99% on a dataset with multi-word common noun phrases and 86.17% for case-sensitive evaluation on a dataset with named entities.
Anthology ID:
R17-1064
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
483–491
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_064
DOI:
10.26615/978-954-452-049-6_064
Bibkey:
Cite (ACL):
Michał Marcińczuk. 2017. Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 483–491, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish (Marcińczuk, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_064