Wiktor Walentynowicz


2022

pdf
Towards a contextualised spatial-diachronic history of literature: mapping emotional representations of the city and the country in Polish fiction from 1864 to 1939
Agnieszka Karlińska | Cezary Rosiński | Jan Wieczorek | Patryk Hubar | Jan Kocoń | Marek Kubis | Stanisław Woźniak | Arkadiusz Margraf | Wiktor Walentynowicz
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In this article, we discuss the conditions surrounding the building of historical and literary corpora. We describe the assumptions and method of making the original corpus of the Polish novel (1864-1939). Then, we present the research procedure aimed at demonstrating the variability of the emotional value of the concept of “the city” and “the country” in the texts included in our corpus. The proposed method considers the complex socio-political nature of Central and Eastern Europe, especially the fact that there was no unified Polish state during this period. The method can be easily replicated in studies of the literature of countries with similar specificities.

2021

pdf
Enriching plWordNet with morphology
Agnieszka Dziob | Wiktor Walentynowicz
Proceedings of the 11th Global Wordnet Conference

In the paper, we present the process of adding morphological information to the Polish WordNet (plWordNet). We describe the reasons for this connection and the intuitions behind it. We also draw attention to the specificity of the Polish morphology. We show in which tasks the morphological information is important and how the methods can be developed by extending them to include combined morphological information based on WordNet.

2019

pdf
Tagger for Polish Computer Mediated Communication Texts
Wiktor Walentynowicz | Maciej Piasecki | Marcin Oleksy
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper we present a morpho-syntactic tagger dedicated to Computer-mediated Communication texts in Polish. Its construction is based on an expanded RNN-based neural network adapted to the work on noisy texts. Among several techniques, the tagger utilises fastText embedding vectors, sequential character embedding vectors, and Brown clustering for the coarse-grained representation of sentence structures. In addition a set of manually written rules was proposed for post-processing. The system was trained to disambiguate descriptions of words in relation to Parts of Speech tags together with the full morphological information in terms of values for the different grammatical categories. We present also evaluation of several model variants on the gold standard annotated CMC data, comparison to the state-of-the-art taggers for Polish and error analysis. The proposed tagger shows significantly better results in this domain and demonstrates the viability of adaptation.