Madara Stāde


2023

pdf
Latvian WordNet
Peteris Paikens | Agute Klints | Ilze Lokmane | Lauma Pretkalniņa | Laura Rituma | Madara Stāde | Laine Strankale
Proceedings of the 12th Global Wordnet Conference

This paper describes the recently developed Latvian WordNet and the main linguistic principles used in its development. The inventory of words and senses is based on the Te̅zaurs.lv online dictionary, restructuring the senses of the most frequently used words based on corpus evidence. The semantic linking methodology adapts Princeton WordNet principles to fit the Latvian language usage and existing linguistic tradition. The semantic links include hyponymy, meronymy, antonymy, similarity, conceptual connection and gradation. We also measure inter-annotator agreement for different types of semantic links. The dataset consists of 7609 words linked in 6515 synsets. 1266 of these words are considered fully completed as they have all the outgoing semantic links annotated, corpus examples assigned for each sense, as well as links to the English Princeton WordNet formed. The data is available to the public on Te̅zaurs.lv as an addition to the general dictionary data, and is also published as a downloadable dataset.

2022

pdf
Towards Latvian WordNet
Peteris Paikens | Mikus Grasmanis | Agute Klints | Ilze Lokmane | Lauma Pretkalniņa | Laura Rituma | Madara Stāde | Laine Strankale
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we describe our current work on creating a WordNet for Latvian based on the principles of the Princeton WordNet. The chosen methodology for word sense definition and sense linking is based on corpus evidence and the existing Tezaurs.lv online dictionary, ensuring a foundation that fits the Latvian language usage and existing linguistic tradition. We cover a wide set of semantic relations, including gradation sets. Currently the dataset consists of 6432 words linked in 5528 synsets, out of which 2717 synsets are considered fully completed as they have all the outgoing semantic links annotated, annotated with corpus examples for each sense and links to the English Princeton WordNet.