Abstract
In the paper we present a tool for lemmatization of multi-word common noun phrases and named entities for Polish called LemmaPL. The tool is based on a set of manually crafted rules and heuristics utilizing a set of dictionaries (including morphological, named entities and inflection patterns). The accuracy of lemmatization obtained by the tool reached 97.99% on a dataset with multi-word common noun phrases and 86.17% for case-sensitive evaluation on a dataset with named entities.- Anthology ID:
- R17-1064
- Volume:
- Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 483–491
- Language:
- URL:
- https://doi.org/10.26615/978-954-452-049-6_064
- DOI:
- 10.26615/978-954-452-049-6_064
- Cite (ACL):
- Michał Marcińczuk. 2017. Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 483–491, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish (Marcińczuk, RANLP 2017)
- PDF:
- https://doi.org/10.26615/978-954-452-049-6_064