A Corpus of Spanish Political Speeches from 1937 to 2019

Elena Álvarez-Mellado


Abstract
This paper documents a corpus of political speeches in Spanish. The documents in the corpus belong to the Christmas speeches that have been delivered yearly by the head of state of Spain since 1937. The historical period covered by these speeches ranges from the Spanish Civil War and the Francoist dictatorship up until today. As a result, the corpus reflects some of the most significant events and political changes in the recent history of Spain. Up until now, the speeches as a whole had not been collected into a single, systematic and reusable resource, as most of the texts were scattered among different sources. The paper describes: (1) the composition of the corpus; (2) the Python interface that facilitates querying and analyzing the corpus using the NLTK and spaCy libraries and (3) a set of HTML visualizations aimed at the general public to navigate the corpus and explore differences between TF-IDF frequencies.
Anthology ID:
2020.lrec-1.116
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
928–932
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.116
DOI:
Bibkey:
Cite (ACL):
Elena Álvarez-Mellado. 2020. A Corpus of Spanish Political Speeches from 1937 to 2019. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 928–932, Marseille, France. European Language Resources Association.
Cite (Informal):
A Corpus of Spanish Political Speeches from 1937 to 2019 (Álvarez-Mellado, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.lrec-1.116.pdf