Arkadiusz Janz


2021

pdf bib
Discriminating Homonymy from Polysemy in Wordnets: English, Spanish and Polish Nouns
Arkadiusz Janz | Marek Maziarz
Proceedings of the 11th Global Wordnet Conference

We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic regression were successfully used in this task, outperforming baselines. The feature set utilised lemma properties, gloss similarities, graph distances and polysemy patterns. The proposed ML models performed equally well for English and the other two languages (constituting testing data sets). The algorithms not only ruled out most cases of homonymy but also were efficacious in distinguishing between closer and indirect semantic relatedness.

pdf bib
Neural Language Models vs Wordnet-based Semantically Enriched Representation in CST Relation Recognition
Arkadiusz Janz | Maciej Piasecki | Piotr Wątorski
Proceedings of the 11th Global Wordnet Conference

Neural language models, including transformer-based models, that are pre-trained on very large corpora became a common way to represent text in various tasks, including recognition of textual semantic relations, e.g. Cross-document Structure Theory. Pre-trained models are usually fine tuned to downstream tasks and the obtained vectors are used as an input for deep neural classifiers. No linguistic knowledge obtained from resources and tools is utilised. In this paper we compare such universal approaches with a combination of rich graph-based linguistically motivated sentence representation and a typical neural network classifier applied to a task of recognition of CST relation in Polish. The representation describes selected levels of the sentence structure including description of lexical meanings on the basis of the wordnet (plWordNet) synsets and connected SUMO concepts. The obtained results show that in the case of difficult relations and medium size training corpus semantically enriched text representation leads to significantly better results.

2020

pdf bib
Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations
Arkadiusz Janz | Łukasz Kopociński | Maciej Piasecki | Agnieszka Pluwak
Proceedings of the 12th Language Resources and Evaluation Conference

Relation Extraction is a fundamental NLP task. In this paper we investigate the impact of underlying text representation on the performance of neural classification models in the task of Brand-Product relation extraction. We also present the methodology of preparing annotated textual corpora for this task and we provide valuable insight into the properties of Brand-Product relations existing in textual corpora. The problem is approached from a practical angle of applications Relation Extraction in facilitating commercial Internet monitoring.

2019

pdf bib
Propagation of emotions, arousal and polarity in WordNet using Heterogeneous Structured Synset Embeddings
Jan Kocoń | Arkadiusz Janz
Proceedings of the 10th Global Wordnet Conference

In this paper we present a novel method for emotive propagation in a wordnet based on a large emotive seed. We introduce a sense-level emotive lexicon annotated with polarity, arousal and emotions. The data were annotated as a part of a large study involving over 20,000 participants. A total of 30,000 lexical units in Polish WordNet were described with metadata, each unit received about 50 annotations concerning polarity, arousal and 8 basic emotions, marked on a multilevel scale. We present a preliminary approach to propagating emotive metadata to unlabeled lexical units based on the distribution of manual annotations using logistic regression and description of mixed synset embeddings based on our Heterogeneous Structured Synset Embeddings.

pdf bib
Testing Zipf’s meaning-frequency law with wordnets as sense inventories
Francis Bond | Arkadiusz Janz | Marek Maziarz | Ewa Rudnicka
Proceedings of the 10th Global Wordnet Conference

According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese, Indonesian and Chinese. We have proved that the law works pretty well for all of these languages if we take - as Zipf did - mean values of meaning count and averaged ranks. On the other hand, the law disastrously fails in predicting the number of senses for a single lemma. We have also provided the evidence that slope coefficients of Zipfian log-log linear model may vary from language to language.

pdf bib
A Comparison of Sense-level Sentiment Scores
Francis Bond | Arkadiusz Janz | Maciej Piasecki
Proceedings of the 10th Global Wordnet Conference

In this paper, we compare a variety of sense-tagged sentiment resources, including SentiWordNet, ML-Senticon, plWordNet emo and the NTU Multilingual Corpus. The goal is to investigate the quality of the resources and see how well the sentiment polarity annotation maps across languages.

pdf bib
Word Sense Disambiguation based on Constrained Random Walks in Linked Semantic Networks
Arkadiusz Janz | Maciej Piasecki
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Word Sense Disambiguation remains a challenging NLP task. Due to the lack of annotated training data, especially for rare senses, the supervised approaches are usually designed for specific subdomains limited to a narrow subset of identified senses. Recent advances in this area have shown that knowledge-based approaches are more scalable and obtain more promising results in all-words WSD scenarios. In this work we present a faster WSD algorithm based on the Monte Carlo approximation of sense probabilities given a context using constrained random walks over linked semantic networks. We show that the local semantic relatedness is mostly sufficient to successfully identify correct senses when an extensive knowledge base and a proper weighting scheme are used. The proposed methods are evaluated on English (SenseEval, SemEval) and Polish (Składnica, KPWr) datasets.

2018

pdf bib
Wordnet-based Evaluation of Large Distributional Models for Polish
Maciej Piasecki | Gabriela Czachor | Arkadiusz Janz | Dominik Kaszewski | Paweł Kędzia
Proceedings of the 9th Global Wordnet Conference

The paper presents construction of large scale test datasets for word embeddings on the basis of a very large wordnet. They were next applied for evaluation of word embedding models and used to assess and compare the usefulness of different word embeddings extracted from a very large corpus of Polish. We analysed also and compared several publicly available models described in literature. In addition, several large word embeddings models built on the basis of a very large Polish corpus are presented.

pdf bib
Recognition of Hyponymy and Meronymy Relations in Word Embeddings for Polish
Gabriela Czachor | Maciej Piasecki | Arkadiusz Janz
Proceedings of the 9th Global Wordnet Conference

Word embeddings were used for the extraction of hyponymy relation in several approaches, but also it was recently shown that they should not work, in fact. In our work we verified both claims using a very large wordnet of Polish as a gold standard for lexico-semantic relations and word embeddings extracted from a very large corpus of Polish. We showed that a hyponymy extraction method based on linear regression classifiers trained on clusters of vectors can be successfully applied on large scale. We presented also a possible explanation for contradictory findings in the literature. Moreover, in order to show the feasibility of the method we extended it to the recognition of meronymy.

pdf bib
Context-sensitive Sentiment Propagation in WordNet
Jan Kocoń | Arkadiusz Janz | Maciej Piasecki
Proceedings of the 9th Global Wordnet Conference

In this paper we present a comprehensive overview of recent methods of the sentiment propagation in a wordnet. Next, we propose a fully automated method called Classifier-based Polarity Propagation, which utilises a very rich set of features, where most of them are based on wordnet relation types, multi-level bag-of-synsets and bag-of-polarities. We have evaluated our solution using manually annotated part of plWordNet 3.1 emo, which contains more than 83k manual sentiment annotations, covering more than 41k synsets. We have demonstrated that in comparison to existing rule-based methods using a specific narrow set of semantic relations our method has achieved statistically significant and better results starting with the same seed synsets.

pdf bib
Classifier-based Polarity Propagation in a WordNet
Jan Kocoń | Arkadiusz Janz | Maciej Piasecki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Graph-Based Approach to Recognizing CST Relations in Polish Texts
Paweł Kędzia | Maciej Piasecki | Arkadiusz Janz
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper presents an supervised approach to the recognition of Cross-document Structure Theory (CST) relations in Polish texts. In the proposed, graph-based representation is constructed for sentences. Graphs are built on the basis of lexicalised syntactic-semantic relation extracted from text. Similarity between sentences is calculated from graph, and the similarity values are input to classifiers trained by Logistic Model Tree. Several different configurations of graph, as well as graph similarity methods were analysed for this tasks. The approach was evaluated on a large open corpus annotated manually with 17 types of selected CST relations. The configuration of experiments was similar to those known from SEMEVAL and we obtained very promising results.