Krešimir Šojat


Redesign of the Croatian derivational lexicon
Matea Filko | Krešimir Šojat | Vanja Štefanec
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology


Further expansion of the Croatian WordNet
Krešimir Šojat | Matea Filko | Antoni Oliver
Proceedings of the 9th Global Wordnet Conference

In this paper a semi-automatic procedure for the expansion of the Croatian Wordnet (CroWN) is presented. An English-Croatian dictionary was used in order to translate monosemous PWN 3.0 English variants. The precision values of the automatic process is low (about 30%), but the results proved valuable for the enlargment of CroWN. After manual validation, 10,884 new synset-variant pairs were added to CroWN, achieving a total of 62,075 synset-variant pairs.

Designing a Croatian Aspectual Derivatives Dictionary: Preliminary Stages
Kristina Kocijan | Krešimir Šojat | Dario Poljak
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

The paper focusses on derivationally connected verbs in Croatian, i.e. on verbs that share the same lexical morpheme and are derived from other verbs via prefixation, suffixation and/or stem alternations. As in other Slavic languages with rich derivational morphology, each verb is marked for aspect, either perfective or imperfective. Some verbs, mostly of foreign origin, are marked as bi-aspectual verbs. The main objective of this paper is to detect and to describe major derivational processes and affixes used in the derivation of aspectually connected verbs with NooJ. Annotated chains are exported into a format adequate for web database system and further used to enhance the aspectual and derivational information for each verb.


Language Generation from DB Query
Kristina Kocijan | Božo Bekavac | Krešimir Šojat
Proceedings of the Linguistic Resources for Automatic Natural Language Generation - LiRA@NLG


Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv
Antoni Oliver | Krešimir Šojat | Matea Srebačić
Proceedings of the International Conference Recent Advances in Natural Language Processing


Morphosemantic relations between verbs in Croatian WordNet
Krešimir Šojat | Matea Srebačić
Proceedings of the Seventh Global Wordnet Conference

CroDeriV: a new resource for processing Croatian morphology
Krešimir Šojat | Matea Srebačić | Marko Tadić | Tin Pavelić
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The paper deals with the processing of Croatian morphology and presents CroDeriV ― a newly developed language resource that contains data about morphological structure and derivational relatedness of verbs in Croatian. In its present shape, CroDeriV contains 14 192 Croatian verbs. Verbs in CroDeriV are analyzed for morphemes and segmented into lexical, derivational and inflectional morphemes. The structure of CroDeriV enables the detection of verbal derivational families in Croatian as well as the distribution and frequency of particular affixes and lexical morphemes. Derivational families consist of a verbal base form and all prefixed or suffixed derivatives detected in available machine readable Croatian dictionaries and corpora. Language data structured in this way was further used for the expansion of other language resources for Croatian, such as Croatian WordNet and the Croatian Morphological Lexicon. Matching the data from CroDeriV on one side and Croatian WordNet and the Croatian Morphological Lexicon on the other resulted in significant enrichment of Croatian WordNet and enlargement of the Croatian Morphological Lexicon.


Generation of Verbal Stems in Derivationally Rich Language
Krešimir Šojat | Nives Mikelić Preradović | Marko Tadić
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper presents a procedure for generating prefixed verbs in Croatian comprising combinations of one, two or three prefixes. The result of this generation process is a pool of derivationally valid prefixed verbs, although not necessarily occuring in corpora. The statistics of occurences of generated verbs in Croatian National Corpus has been calculated. Further usage of such language resource with generated potential verbs is also suggested, namely, enrichment of Croatian Morphological Lexicon, Croatian Wordnet and CROVALLEX.