Abstract
This paper documents a corpus of political speeches in Spanish. The documents in the corpus belong to the Christmas speeches that have been delivered yearly by the head of state of Spain since 1937. The historical period covered by these speeches ranges from the Spanish Civil War and the Francoist dictatorship up until today. As a result, the corpus reflects some of the most significant events and political changes in the recent history of Spain. Up until now, the speeches as a whole had not been collected into a single, systematic and reusable resource, as most of the texts were scattered among different sources. The paper describes: (1) the composition of the corpus; (2) the Python interface that facilitates querying and analyzing the corpus using the NLTK and spaCy libraries and (3) a set of HTML visualizations aimed at the general public to navigate the corpus and explore differences between TF-IDF frequencies.- Anthology ID:
- 2020.lrec-1.116
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 928–932
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.116
- DOI:
- Cite (ACL):
- Elena Álvarez-Mellado. 2020. A Corpus of Spanish Political Speeches from 1937 to 2019. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 928–932, Marseille, France. European Language Resources Association.
- Cite (Informal):
- A Corpus of Spanish Political Speeches from 1937 to 2019 (Álvarez-Mellado, LREC 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.lrec-1.116.pdf