2019
pdf
Redesign of the Croatian derivational lexicon
Matea Filko
|
Krešimir Šojat
|
Vanja Štefanec
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology
2018
pdf
abs
Further expansion of the Croatian WordNet
Krešimir Šojat
|
Matea Filko
|
Antoni Oliver
Proceedings of the 9th Global Wordnet Conference
In this paper a semi-automatic procedure for the expansion of the Croatian Wordnet (CroWN) is presented. An English-Croatian dictionary was used in order to translate monosemous PWN 3.0 English variants. The precision values of the automatic process is low (about 30%), but the results proved valuable for the enlargment of CroWN. After manual validation, 10,884 new synset-variant pairs were added to CroWN, achieving a total of 62,075 synset-variant pairs.
pdf
abs
Designing a Croatian Aspectual Derivatives Dictionary: Preliminary Stages
Kristina Kocijan
|
Krešimir Šojat
|
Dario Poljak
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
The paper focusses on derivationally connected verbs in Croatian, i.e. on verbs that share the same lexical morpheme and are derived from other verbs via prefixation, suffixation and/or stem alternations. As in other Slavic languages with rich derivational morphology, each verb is marked for aspect, either perfective or imperfective. Some verbs, mostly of foreign origin, are marked as bi-aspectual verbs. The main objective of this paper is to detect and to describe major derivational processes and affixes used in the derivation of aspectually connected verbs with NooJ. Annotated chains are exported into a format adequate for web database system and further used to enhance the aspectual and derivational information for each verb.
2017
pdf
Language Generation from DB Query
Kristina Kocijan
|
Božo Bekavac
|
Krešimir Šojat
Proceedings of the Linguistic Resources for Automatic Natural Language Generation - LiRA@NLG
2016
pdf
abs
Verbal Multiword Expressions in Croatian
Krešimir Šojat
|
Matea Filko
|
Daša Farkaš
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)
The paper deals with verbal multiword expressions in Croatian. We focus on four types of verbal constructions: light verb constructions, i.e. constructions consisting of a light verb and a noun or prepositional phrase, complex predicate constructions, i.e. constructions consisting of a finite and infinitive verb, prepositional verb constructions, i.e. constructions consisting of a verb and a typical preposition, and, finally, verbal idioms, i.e. constructions with completely idiosyncratic meanings. All the constructions are annotated in the Universal Dependency treebank for Croatian. The identification of verbal multiword expressions is an important task in numerous NLP tasks. It is also important to define and delimitate this concept in linguistic theory.
2015
pdf
Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv
Antoni Oliver
|
Krešimir Šojat
|
Matea Srebačić
Proceedings of the International Conference Recent Advances in Natural Language Processing
2014
pdf
abs
CroDeriV: a new resource for processing Croatian morphology
Krešimir Šojat
|
Matea Srebačić
|
Marko Tadić
|
Tin Pavelić
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The paper deals with the processing of Croatian morphology and presents CroDeriV ― a newly developed language resource that contains data about morphological structure and derivational relatedness of verbs in Croatian. In its present shape, CroDeriV contains 14 192 Croatian verbs. Verbs in CroDeriV are analyzed for morphemes and segmented into lexical, derivational and inflectional morphemes. The structure of CroDeriV enables the detection of verbal derivational families in Croatian as well as the distribution and frequency of particular affixes and lexical morphemes. Derivational families consist of a verbal base form and all prefixed or suffixed derivatives detected in available machine readable Croatian dictionaries and corpora. Language data structured in this way was further used for the expansion of other language resources for Croatian, such as Croatian WordNet and the Croatian Morphological Lexicon. Matching the data from CroDeriV on one side and Croatian WordNet and the Croatian Morphological Lexicon on the other resulted in significant enrichment of Croatian WordNet and enlargement of the Croatian Morphological Lexicon.
pdf
Morphosemantic relations between verbs in Croatian WordNet
Krešimir Šojat
|
Matea Srebačić
Proceedings of the Seventh Global Wordnet Conference
2012
pdf
abs
Generation of Verbal Stems in Derivationally Rich Language
Krešimir Šojat
|
Nives Mikelić Preradović
|
Marko Tadić
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The paper presents a procedure for generating prefixed verbs in Croatian comprising combinations of one, two or three prefixes. The result of this generation process is a pool of derivationally valid prefixed verbs, although not necessarily occuring in corpora. The statistics of occurences of generated verbs in Croatian National Corpus has been calculated. Further usage of such language resource with generated potential verbs is also suggested, namely, enrichment of Croatian Morphological Lexicon, Croatian Wordnet and CROVALLEX.