2023
pdf
abs
Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach
Marek Maziarz
|
Łukasz Grabowski
|
Tadeusz Piotrowski
|
Ewa Rudnicka
|
Maciej Piasecki
Proceedings of the 12th Global Wordnet Conference
Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rule-based and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.
2022
pdf
abs
Multi-word Lexical Units Recognition in WordNet
Marek Maziarz
|
Ewa Rudnicka
|
Łukasz Grabowski
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
WordNet is a state-of-the-art lexical resource used in many tasks in Natural Language Processing, also in multi-word expression (MWE) recognition. However, not all MWEs recorded in WordNet could be indisputably called lexicalised. Some of them are semantically compositional and show no signs of idiosyncrasy. This state of affairs affects all evaluation measures that use the list of all WordNet MWEs as a gold standard. We propose a method of distinguishing between lexicalised and non-lexicalised word combinations in WordNet, taking into account lexicality features, such as semantic compositionality, MWE length and translational criterion. Both a rule-based approach and a ridge logistic regression are applied, beating a random baseline in precision of singling out lexicalised MWEs, as well as in recall of ruling out cases of non-lexicalised MWEs.
2018
pdf
abs
Lexical Perspective on Wordnet to Wordnet Mapping
Ewa Rudnicka
|
Francis Bond
|
Łukasz Grabowski
|
Maciej Piasecki
|
Tadeusz Piotrowski
Proceedings of the 9th Global Wordnet Conference
The paper presents a feature-based model of equivalence targeted at (manual) sense linking between Princeton WordNet and plWordNet. The model incorporates insights from lexicographic and translation theories on bilingual equivalence and draws on the results of earlier synset-level mapping of nouns between Princeton WordNet and plWordNet. It takes into account all basic aspects of language such as form, meaning and function and supplements them with (parallel) corpus frequency and translatability. Three types of equivalence are distinguished, namely strong, regular and weak depending on the conformity with the proposed features. The presented solutions are language-neutral and they can be easily applied to language pairs other than Polish and English. Sense-level mapping is a more fine-grained mapping than the existing synset mappings and is thus of great potential to human and machine translation.
2016
pdf
abs
Towards a methodology for filtering out gaps and mismatches across wordnets: the case of plWordNet and Princeton WordNet
Ewa Rudnicka
|
Wojciech Witkowski
|
Łukasz Grabowski
Proceedings of the 8th Global WordNet Conference (GWC)
This paper presents the results of large-scale noun synset mapping between plWordNet, the wordnet of Polish, and Princeton WordNet, the wordnet of English, which have shown high predominance of inter-lingual hyponymy relation over inter-synonymy relation. Two main sources of such effect are identified in the paper: differences in the methodologies of construction of plWN and PWN and cross-linguistic differences in lexicalization of concepts and grammatical categories between English and Polish. Next, we propose a typology of specific gaps and mismatches across wordnets and a rule-based system of filters developed specifically to scan all I(inter-lingual)-hyponymy links between plWN and PWN. The proposed system, it should be stressed, also enables one to pinpoint the frequencies of the identified gaps and mismatches.