Ahti Lohk


2020

pdf bib
Some Issues with Building a Multilingual Wordnet
Francis Bond | Luis Morgado da Costa | Michael Wayne Goodman | John Philip McCrae | Ahti Lohk
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets – the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects.

2019

pdf bib
New Polysemy Structures in Wordnets Induced by Vertical Polysemy
Ahti Lohk | Heili Orav | Kadri Vare | Francis Bond | Rasmus Vaik
Proceedings of the 10th Global Wordnet Conference

This paper aims to study auto-hyponymy and auto-troponymy relations (or vertical polysemy) in 11 wordnets uploaded into the new Open Multilingual Wordnet (OMW) webpage. We investigate how vertical polysemy forms polysemy structures (or sense clusters) in semantic hierarchies of the wordnets. Our main results and discoveries are new polysemy structures that have not previously been associated with vertical polysemy, along with some inconsistencies of semantic relations analysis in the studied wordnets, which should not be there. In the case study, we turn attention to polysemy structures in the Estonian Wordnet (version 2.2.0), analyzing them and giving the lexicographers comments. In addition, we describe the detection algorithm of polysemy structures and an overview of the state of polysemy structures in 11 wordnets.

2018

pdf bib
An Experiment: Using Google Translate and Semantic Mirrors to Create Synsets with Many Lexical Units
Ahti Lohk | Mati Tombak | Kadri Vare
Proceedings of the 9th Global Wordnet Conference

One of the fundamental building blocks of a wordnet is synonym sets or synsets, which group together similar word meanings or synonyms. These synsets can consist either one or more synonyms. This paper describes an automatic method for composing synsets with multiple synonyms by using Google Translate and Semantic Mirrors’ method. Also, we will give an overview of the results and discuss the advantages of the proposed method from wordnet’s point of view.

2016

pdf bib
Tuning Hierarchies in Princeton WordNet
Ahti Lohk | Christiane Fellbaum | Leo Vohandu
Proceedings of the 8th Global WordNet Conference (GWC)

Many new wordnets in the world are constantly created and most take the original Princeton WordNet (PWN) as their starting point. This arguably central position imposes a responsibility on PWN to ensure that its structure is clean and consistent. To validate PWN hierarchical structures we propose the application of a system of test patterns. In this paper, we report on how to validate the PWN hierarchies using the system of test patterns. In sum, test patterns provide lexicographers with a very powerful tool, which we hope will be adopted by the global wordnet community.

pdf bib
Experiences of Lexicographers and Computer Scientists in Validating Estonian Wordnet with Test Patterns
Ahti Lohk | Heili Orav | Kadri Vare | Leo Vohandu
Proceedings of the 8th Global WordNet Conference (GWC)

New concepts and semantic relations are constantly added to Estonian Wordnet (EstWN) to increase its size. In addition to this, with the use of test patterns, the validation of EstWN hierarchies is also performed. This parallel work was carried out over the past four years (2011-2014) with 10 different EstWN versions (60-70). This has been a collaboration between the creators of test patterns and the lexicographers currently working on EstWN. This paper describes the usage of test patterns from the points of views of information scientists (the creators of test patterns) as well as the users (lexicographers). Using EstWN as an example, we illustrate how the continuous use of test patterns has led to significant improvement of the semantic hierarchies in EstWN.

2014

pdf bib
Dense Components in the Structure of WordNet
Ahti Lohk | Kaarel Allik | Heili Orav | Leo Võhandu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces a test-pattern named a dense component for checking inconsistencies in the hierarchical structure of a wordnet. Dense component (viewed as substructure) points out the cases of regular polysemy in the context of multiple inheritance. Definition of the regular polysemy is redefined ― instead of lexical units there are used lexical concepts (synsets). All dense components are evaluated by expert lexicographer. Based on this experiment we give an overview of the inconsistencies which the test-pattern helps to detect. Special attention is turned to all different kind of corrections made by lexicographer. Authors of this paper find that the greatest benefit of the use of dense components is helping to detect if the regular polysemy is justified or not. In-depth analysis has been performed for Estonian Wordnet Version 66. Some comparative figures are also given for the Estonian Wordnet (EstWN) Version 67 and Princeton WordNet (PrWN) Version 3.1. Analysing hierarchies only hypernym-relations are used.

pdf bib
Some structural tests for WordNet with results
Ahti Lohk | Heili Orav | Leo Võhandu
Proceedings of the Seventh Global Wordnet Conference

2012

pdf bib
First steps in checking and comparing Princeton WordNet and Estonian Wordnet
Ahti Lohk | Kadri Vare | Leo Võhandu
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH