Andreas Blaette


pdf bib
How GermaParl Evolves: Improving Data Quality by Reproducible Corpus Preparation and User Involvement
Andreas Blaette | Julia Rakers | Christoph Leonhardt
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

The development and curation of large-scale corpora of plenary debates requires not only care and attention to detail when the data is created but also effective means of sustainable quality control. This paper makes two contributions: Firstly, it presents an updated version of the GermaParl corpus of parliamentary debates in the German *Bundestag*. Secondly, it shows how the corpus preparation pipeline is designed to serve the quality of the resource by facilitating effective community involvement. Centered around a workflow which combines reproducibility, transparency and version control, the pipeline allows for continuous improvements to the corpus.


The Europeanization of Parliamentary Debates on Migration in Austria, France, Germany, and the Netherlands
Andreas Blaette | Simon Gehlhar | Christoph Leonhardt
Proceedings of the Second ParlaCLARIN Workshop

Corpora of plenary debates in national parliaments are available for many European states. For comparative research on political discourse, a persisting problem is that the periods covered by corpora differ and that a lack of standardization of data formats inhibits the integration of corpora into a single analytical framework. The solution we pursue is a ‘Framework for Parsing Plenary Protocols’ (frappp), which has been used to prepare corpora of the Assemblée Nationale (‘‘ParisParl”), the German Bundestag (‘‘GermaParl”), the Tweede Kamer of the Netherlands (‘‘TweedeTwee”), and the Austrian Nationalrat (‘‘AustroParl”) for the first two decades of the 21st century (2000-2019). To demonstrate the usefulness of the data gained, we investigate the Europeanization of migration debates in these Western European countries of immigration, i.e. references to a European dimension of policy-making in speeches on migration and integration. Based on a segmentation of the corpora into speeches, the method we use is topic modeling, and the analysis of joint occurrences of topics indicating migration and European affairs, respectively. A major finding is that after 2015, we see an increasing Europeanization of migration debates in the small EU member states in our sample (Austria and the Netherlands), and a regression of respective Europeanization in France and – more notably – in Germany.