There’s Something New about the Italian Parliament: The IPSA Corpus

Valentino Frasnelli, Alessio Palmero Aprosio


Abstract
Parliamentary debates constitute a substantial and somewhat underutilized reservoir of publicly available written content. Despite their potential, the Italian parliamentary documents remain largely unexplored and most importantly inaccessible in their original paper-based form. In this paper we attempt to transform these valuable historical documents into IPSA, a digitally readable structured corpus containing speeches, reports of the Standing Committees, and law proposals spanning 175 years of Italian history, from the issuing of the Statuto Albertino in 1848, up to the present day. At first, the PDF documents, available on the official websites of Senato della Repubblica and Camera dei Deputati, the two chambers that form the Italian Parliament, are digitized using Optical Character Recognition (OCR) techniques. Then, the speeches are tagged with the corresponding speakers. The final dataset is released both in textual and structured format.
Anthology ID:
2024.lrec-main.1394
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16037–16046
Language:
URL:
https://aclanthology.org/2024.lrec-main.1394
DOI:
Bibkey:
Cite (ACL):
Valentino Frasnelli and Alessio Palmero Aprosio. 2024. There’s Something New about the Italian Parliament: The IPSA Corpus. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16037–16046, Torino, Italia. ELRA and ICCL.
Cite (Informal):
There’s Something New about the Italian Parliament: The IPSA Corpus (Frasnelli & Palmero Aprosio, LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.1394.pdf