2022
pdf
abs
Towards a contextualised spatial-diachronic history of literature: mapping emotional representations of the city and the country in Polish fiction from 1864 to 1939
Agnieszka Karlińska
|
Cezary Rosiński
|
Jan Wieczorek
|
Patryk Hubar
|
Jan Kocoń
|
Marek Kubis
|
Stanisław Woźniak
|
Arkadiusz Margraf
|
Wiktor Walentynowicz
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
In this article, we discuss the conditions surrounding the building of historical and literary corpora. We describe the assumptions and method of making the original corpus of the Polish novel (1864-1939). Then, we present the research procedure aimed at demonstrating the variability of the emotional value of the concept of “the city” and “the country” in the texts included in our corpus. The proposed method considers the complex socio-political nature of Central and Eastern Europe, especially the fact that there was no unified Polish state during this period. The method can be easily replicated in studies of the literature of countries with similar specificities.
pdf
abs
DiaBiz.Kom - towards a Polish Dialogue Act Corpus Based on ISO 24617-2 Standard
Marcin Oleksy
|
Jan Wieczorek
|
Dorota Drużyłowska
|
Julia Klyus
|
Aleksandra Domogała
|
Krzysztof Hwaszcz
|
Hanna Kędzierska
|
Daria Mikoś
|
Anita Wróż
Proceedings of the 29th International Conference on Computational Linguistics
This article presents the specification and evaluation of DiaBiz.Kom – the corpus of dialogue texts in Polish. The corpus contains transcriptions of telephone conversations conducted according to a prepared scenario. The transcripts of conversations have been manually annotated with a layer of information concerning communicative functions. DiaBiz.Kom is the first corpus of this type prepared for the Polish language and will be used to develop a system of dialog analysis and modules for creating advanced chatbots.
2020
pdf
abs
PST 2.0 – Corpus of Polish Spatial Texts
Michał Marcińczuk
|
Marcin Oleksy
|
Jan Wieczorek
Proceedings of the Twelfth Language Resources and Evaluation Conference
In the paper, we focus on modeling spatial expressions in texts. We present the guidelines used to annotate the PST 2.0 (Corpus of Polish Spatial Texts) — a corpus designed for training and testing the tools for spatial expression recognition. The corpus contains a set of texts gathered from texts collected from travel blogs available under Creative Commons license. We have defined our guidelines based on three existing specifications for English (SpatialML, SpatialRole Labelling from SemEval-2013 Task 3 and ISO-Space1.4 from SpaceEval 2014). We briefly present the existing specifications and discuss what modifications have been made to adapt the guidelines to the characteristics of the Polish language. We also describe the process of data collection and manual annotation, including inter-annotator agreement calculation and corpus statistics. In the end, we present detailed statistics of the PST 2.0 corpus, which include the number of components, relations, expressions, and the most common values of spatial indicators, motion indicators, path indicators, distances, directions, and regions.