New Developments in the Polish Parliamentary Corpus

Maciej Ogrodniczuk, Bartłomiej Nitoń


Abstract
This short paper presents the current (as of February 2020) state of preparation of the Polish Parliamentary Corpus (PPC)—an extensive collection of transcripts of Polish parliamentary proceedings dating from 1919 to present. The most evident developments as compared to the 2018 version is harmonization of metadata, standardization of document identifiers, uploading contents of all documents and metadata to the database (to enable easier modification, maintenance and future development of the corpus), linking utterances to the political ontology, linking corpus texts to source data and processing historical documents.
Anthology ID:
2020.parlaclarin-1.1
Volume:
Proceedings of the Second ParlaCLARIN Workshop
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
ParlaCLARIN
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1–4
Language:
English
URL:
https://aclanthology.org/2020.parlaclarin-1.1
DOI:
Bibkey:
Cite (ACL):
Maciej Ogrodniczuk and Bartłomiej Nitoń. 2020. New Developments in the Polish Parliamentary Corpus. In Proceedings of the Second ParlaCLARIN Workshop, pages 1–4, Marseille, France. European Language Resources Association.
Cite (Informal):
New Developments in the Polish Parliamentary Corpus (Ogrodniczuk & Nitoń, ParlaCLARIN 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2020.parlaclarin-1.1.pdf