Ewa Rudnicka


2023

pdf
Documenting the Open Multilingual Wordnet
Francis Bond | Michael Wayne Goodman | Ewa Rudnicka | Luis Morgado da Costa | Alexandre Rademaker | John P. McCrae
Proceedings of the 12th Global Wordnet Conference

In this project note we describe our work to make better documentation for the Open Multilingual Wordnet (OMW), a platform integrating many open wordnets. This includes the documentation of the OMW website itself as well as of semantic relations used by the component wordnets. Some of this documentation work was done with the support of the Google Season of Docs. The OMW project page, which links both to the actual OMW server and the documentation has been moved to a new location: https://omwn.org.

pdf
Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach
Marek Maziarz | Łukasz Grabowski | Tadeusz Piotrowski | Ewa Rudnicka | Maciej Piasecki
Proceedings of the 12th Global Wordnet Conference

Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rule-based and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.

2022

pdf
Multi-word Lexical Units Recognition in WordNet
Marek Maziarz | Ewa Rudnicka | Łukasz Grabowski
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

WordNet is a state-of-the-art lexical resource used in many tasks in Natural Language Processing, also in multi-word expression (MWE) recognition. However, not all MWEs recorded in WordNet could be indisputably called lexicalised. Some of them are semantically compositional and show no signs of idiosyncrasy. This state of affairs affects all evaluation measures that use the list of all WordNet MWEs as a gold standard. We propose a method of distinguishing between lexicalised and non-lexicalised word combinations in WordNet, taking into account lexicality features, such as semantic compositionality, MWE length and translational criterion. Both a rule-based approach and a ridge logistic regression are applied, beating a random baseline in precision of singling out lexicalised MWEs, as well as in recall of ruling out cases of non-lexicalised MWEs.

2021

pdf
The GlobalWordNet Formats: Updates for 2020
John P. McCrae | Michael Wayne Goodman | Francis Bond | Alexandre Rademaker | Ewa Rudnicka | Luis Morgado Da Costa
Proceedings of the 11th Global Wordnet Conference

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid. As a result of their adoption, a number of shortcomings of the format were identified, and in this paper we describe the extensions to the formats that address these issues. These include: ordering of senses, dependencies between wordnets, pronunciation, syntactic modelling, relations, sense keys, metadata and RDF support. Furthermore, we provide some perspectives on how these changes help in the integration of wordnets.

pdf
A (Non)-Perfect Match: Mapping plWordNet onto PrincetonWordNet
Ewa Rudnicka | Wojciech Witkowski | Maciej Piasecki
Proceedings of the 11th Global Wordnet Conference

The paper reports on the methodology and final results of a large-scale synset mapping between plWordNet and Princeton WordNet. Dedicated manual and semi-automatic mapping procedures as well as interlingual relation types for nouns, verbs, adjectives and adverbs are described. The statistics of all types of interlingual relations are also provided.

pdf
Testing agreement between lexicographers: A case of homonymy and polysemy
Marek Maziarz | Francis Bond | Ewa Rudnicka
Proceedings of the 11th Global Wordnet Conference

In this paper we compare Oxford Lexico and Merriam Webster dictionaries with Princeton WordNet with respect to the description of semantic (dis)similarity between polysemous and homonymous senses that could be inferred from them. WordNet lacks any explicit description of polysemy or homonymy, but as a network of linked senses it may be used to compute semantic distances between word senses. To compare WordNet with the dictionaries, we transformed sample entry microstructures of the latter into graphs and cross-linked them with the equivalent senses of the former. We found that dictionaries are in high agreement with each other, if one considers polysemy and homonymy altogether, and in moderate concordance, if one focuses merely on polysemy descriptions. Measuring the shortest path lengths on WordNet gave results comparable to those on the dictionaries in predicting semantic dissimilarity between polysemous senses, but was less felicitous while recognising homonymy.

2020

pdf
English WordNet 2020: Improving and Extending a WordNet for English using an Open-Source Methodology
John Philip McCrae | Alexandre Rademaker | Ewa Rudnicka | Francis Bond
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

WordNet, while one of the most widely used resources for NLP, has not been updated for a long time, and as such a new project English WordNet has arisen to continue the development of the model under an open-source paradigm. In this paper, we detail the second release of this resource entitled “English WordNet 2020”. The work has focused firstly, on the introduction of new synsets and senses and developing guidelines for this and secondly, on the integration of contributions from other projects. We present the changes in this edition, which total over 15,000 changes over the previous release.

pdf
A Dataset of Translational Equivalents Built on the Basis of plWordNet-Princeton WordNet Synset Mapping
Ewa Rudnicka | Tomasz Naskręt
Proceedings of the Twelfth Language Resources and Evaluation Conference

The paper presents a dataset of 11,000 Polish-English translational equivalents in the form of pairs of plWordNet and Princeton WordNet lexical units linked by three types of equivalence links: strong equivalence, regular equivalence, and weak equivalence. The resource consists of the two subsets. The first subset was built in result of manual annotation of an extended sample of Polish-English sense pairs partly randomly extracted from synsets linked by interlingual relations such as I-synononymy, I-partial synonymy and I-hyponymy and partly manually selected from the surrounding synsets in the hypernymy hierarchy. The second subset was created as a result of the manual checkup of an automatically generated lists of pairs of sense equivalents on the basis of a couple of simple, rule-based heuristics. For both subsets, the same methodology of equivalence annotation was adopted based on the verification of a set of formal, semantic-pragmatic and translational features. The constructed dataset is a novum in the wordnet domain and can facilitate the precision of bilingual NLP tasks such as automatic translation, bilingual word sense disambiguation and sentiment annotation.

2019

pdf
English WordNet 2019 – An Open-Source WordNet for English
John P. McCrae | Alexandre Rademaker | Francis Bond | Ewa Rudnicka | Christiane Fellbaum
Proceedings of the 10th Global Wordnet Conference

We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model. In particular, this version of WordNet, which we call English WordNet 2019, which has been developed by multiple people around the world through GitHub, fixes many errors in previous wordnets for English. We give some details of the changes that have been made in this version and give some perspectives about likely future changes that will be made as this project continues to evolve.

pdf
Testing Zipf’s meaning-frequency law with wordnets as sense inventories
Francis Bond | Arkadiusz Janz | Marek Maziarz | Ewa Rudnicka
Proceedings of the 10th Global Wordnet Conference

According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese, Indonesian and Chinese. We have proved that the law works pretty well for all of these languages if we take - as Zipf did - mean values of meaning count and averaged ranks. On the other hand, the law disastrously fails in predicting the number of senses for a single lemma. We have also provided the evidence that slope coefficients of Zipfian log-log linear model may vary from language to language.

pdf
plWordNet 4.1 - a Linguistically Motivated, Corpus-based Bilingual Resource
Agnieszka Dziob | Maciej Piasecki | Ewa Rudnicka
Proceedings of the 10th Global Wordnet Conference

The paper presents the latest release of the Polish WordNet, namely plWordNet 4.1. The most significant developments since 3.0 version include new relations for nouns and verbs, mapping semantic role-relations from the valency lexicon Walenty onto the plWordNet structure and sense-level inter-lingual mapping. Several statistics are presented in order to illustrate the development and contemporary state of the wordnet.

2018

pdf
Lexical Perspective on Wordnet to Wordnet Mapping
Ewa Rudnicka | Francis Bond | Łukasz Grabowski | Maciej Piasecki | Tadeusz Piotrowski
Proceedings of the 9th Global Wordnet Conference

The paper presents a feature-based model of equivalence targeted at (manual) sense linking between Princeton WordNet and plWordNet. The model incorporates insights from lexicographic and translation theories on bilingual equivalence and draws on the results of earlier synset-level mapping of nouns between Princeton WordNet and plWordNet. It takes into account all basic aspects of language such as form, meaning and function and supplements them with (parallel) corpus frequency and translatability. Three types of equivalence are distinguished, namely strong, regular and weak depending on the conformity with the proposed features. The presented solutions are language-neutral and they can be easily applied to language pairs other than Polish and English. Sense-level mapping is a more fine-grained mapping than the existing synset mappings and is thus of great potential to human and machine translation.

2016

pdf
plWordNet 3.0 – Almost There
Maciej Piasecki | Stan Szpakowicz | Marek Maziarz | Ewa Rudnicka
Proceedings of the 8th Global WordNet Conference (GWC)

It took us nearly ten years to get from no wordnet for Polish to the largest wordnet ever built. We started small but quickly learned to dream big. Now we are about to release plWordNet 3.0-emo – complete with sentiment and emotions annotated – and a domestic version of Princeton WordNet, larger than WordNet 3.1 by nearly ten thousand newly added words. The paper retraces the road we travelled and talks a little about the future.

pdf
Towards a methodology for filtering out gaps and mismatches across wordnets: the case of plWordNet and Princeton WordNet
Ewa Rudnicka | Wojciech Witkowski | Łukasz Grabowski
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents the results of large-scale noun synset mapping between plWordNet, the wordnet of Polish, and Princeton WordNet, the wordnet of English, which have shown high predominance of inter-lingual hyponymy relation over inter-synonymy relation. Two main sources of such effect are identified in the paper: differences in the methodologies of construction of plWN and PWN and cross-linguistic differences in lexicalization of concepts and grammatical categories between English and Polish. Next, we propose a typology of specific gaps and mismatches across wordnets and a rule-based system of filters developed specifically to scan all I(inter-lingual)-hyponymy links between plWN and PWN. The proposed system, it should be stressed, also enables one to pinpoint the frequencies of the identified gaps and mismatches.

pdf
plWordNet 3.0 – a Comprehensive Lexical-Semantic Resource
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz | Paweł Kędzia
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected in wordnets – richly interrelated synsets – it contains sentiment and emotion annotations, a large set of multi-word expressions, and a mapping onto WordNet 3.1. Part of the release is enWordNet 1.0, a substantially enlarged copy of WordNet 3.1, with material added to allow for a more complete mapping. The paper discusses the design principles of plWordNet, its content, its statistical portrait, a comparison with similar resources, and a partial list of applications.

pdf
Challenges of Adjective Mapping between plWordNet and Princeton WordNet
Ewa Rudnicka | Wojciech Witkowski | Katarzyna Podlaska
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper presents the strategy and results of mapping adjective synsets between plWordNet (the wordnet of Polish, cf. Piasecki et al. 2009, Maziarz et al. 2013) and Princeton WordNet (cf. Fellbaum 1998). The main challenge of this enterprise has been very different synset relation structures in the two networks: horizontal, dumbbell-model based in PWN and vertical, hyponymy-based in plWN. Moreover, the two wordnets display differences in the grouping of adjectives into semantic domains and in the size of the adjective category. The handle the above contrasts, a series of automatic prompt algorithms and a manual mapping procedure relying on corresponding synset and lexical unit relations as well as on inter-lingual relations between noun synsets were proposed in the pilot stage of mapping (Rudnicka et al. 2015). In the paper we discuss the final results of the mapping process as well as explain example mapping choices. Suggestions for further development of mapping are also given.

2014

pdf
plWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz
Proceedings of the Seventh Global Wordnet Conference

pdf
Registers in the System of Semantic Relations in plWordNet
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz
Proceedings of the Seventh Global Wordnet Conference

2013

pdf
Beyond the Transfer-and-Merge Wordnet Construction: plWordNet and a Comparison with WordNet
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf
A Strategy of Mapping Polish WordNet onto Princeton WordNet
Ewa Rudnicka | Marek Maziarz | Maciej Piasecki | Stan Szpakowicz
Proceedings of COLING 2012: Posters