Tomasz Śniatowski


2011

pdf
Maca – a configurable tool to integrate Polish morphological data
Adam Radziszewski | Tomasz Śniatowski
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation

There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simple and universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphological analysers, including simple tagset conversion.