Meden Katja

2024

pdf abs
Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines
Çağrı Çöltekin | Matyáš Kopp | Meden Katja | Vaidas Morkevicius | Nikola Ljubešić | Tomaž Erjavec
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

We introduce a dataset on political orientation and power position identification. The dataset is derived from ParlaMint, a set of comparable corpora of transcribed parliamentary speeches from 29 national and regional parliaments. We introduce the dataset, provide the reasoning behind some of the choices during its creation, present statistics on the dataset, and, using a simple classifier, some baseline results on predicting political orientation on the left-to-right axis, and on power position identification, i.e., distinguishing between the speeches delivered by governing coalition party members from those of opposition party members.

2022

In ParlaMint I, a CLARIN-ERIC supported project in pandemic times, a set of comparable and uniformly annotated multilingual corpora for 17 national parliaments were developed and released in 2021. For 2022 and 2023, the project has been extended to ParlaMint II, again with the CLARIN ERIC financial support, in order to enhance the existing corpora with new data and metadata; upgrade the XML schema; add corpora for 10 new parliaments; provide more application scenarios and carry out additional experiments. The paper reports on these planned steps, including some that have already been taken, and outlines future plans.

Co-authors

Maciej Ogrodniczuk 1

Petya Osenova 1

Darja Fišer 1

Venues

parlaclarin2
ws1