Aleksandra Zwierzchowska
2026
POLAR: A Corpus of Questions, Responses and Argumentation in Polish Political Radio Discourse
Daniel Ziembicki | Aleksandra Zwierzchowska | Ewelina Sobol | Katarzyna Anna Przerada
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Daniel Ziembicki | Aleksandra Zwierzchowska | Ewelina Sobol | Katarzyna Anna Przerada
Proceedings of the Fifteenth Language Resources and Evaluation Conference
In this paper, we present POLAR: an experimental dataset designed to investigate question–answer structures in political interviews. The study also aims to integrate this level of annotation with the identification of argumentative structures. The dataset comprises orthographic transcriptions of Polish political radio interviews conducted between December 2023 and March 2024, with a total duration of nearly 10 hours of recordings (94,015 tokens). Manual annotation was performed on three levels: (a) identification of questions as speech acts, (b) classification of responses to questions, and (c) argumentative structures in which interrogative sentences function as premises or conclusions. The results show that not all interrogative sentences function as questions in the sense of requesting information — 23% do not serve this function, while 13% were identified as components of argumentative structures. We also introduce a gold-standard corpus, together with baseline experiments and LLM-based evaluations, demonstrating the usefulness of the resource for both theoretical research and NLP applications.
2024
Polish Discourse Corpus (PDC): Corpus Design, ISO-Compliant Annotation, Data Highlights, and Parser Development
Maciej Ogrodniczuk | Aleksandra Tomaszewska | Daniel Ziembicki | Sebastian Żurowski | Ryszard Tuora | Aleksandra Zwierzchowska
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Maciej Ogrodniczuk | Aleksandra Tomaszewska | Daniel Ziembicki | Sebastian Żurowski | Ryszard Tuora | Aleksandra Zwierzchowska
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper presents the Polish Discourse Corpus, a pioneering resource of this kind for Polish and the first corpus in Poland to employ the ISO standard for discourse relation annotation. The Polish Discourse Corpus adopts ISO 24617-8, a segment of the Language Resource Management – Semantic Annotation Framework (SemAF), which outlines a set of core discourse relations adaptable for diverse languages and genres. The paper overviews the corpus architecture, annotation procedures, the challenges that the annotators have encountered, as well as key statistical data concerning discourse relations and connectives in the corpus. It further discusses the initial phases of the discourse parser tailored for the ISO 24617-8 framework. Evaluations on the efficacy and potential refinement areas of the corpus annotation and parsing strategies are also presented. The final part of the paper touches upon anticipated research plans to improve discourse analysis techniques in the project and to conduct discourse studies involving multiple languages.