Word Clustering for Historical Newspapers Analysis

Lidia Pivovarova, Elaine Zosa, Jani Marjanen


Abstract
This paper is a part of a collaboration between computer scientists and historians aimed at development of novel tools and methods to improve analysis of historical newspapers. We present a case study of ideological terms ending with -ism suffix in nineteenth century Finnish newspapers. We propose a two-step procedure to trace differences in word usages over time: training of diachronic embeddings on several time slices and when clustering embeddings of selected words together with their neighbours to obtain historical context. The obtained clusters turn out to be useful for historical studies. The paper also discuss specific difficulties related to development historian-oriented tools.
Anthology ID:
W19-9002
Volume:
Proceedings of the Workshop on Language Technology for Digital Historical Archives
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Cristina Vertan, Petya Osenova, Dimitar Iliev
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
3–10
Language:
URL:
https://aclanthology.org/W19-9002
DOI:
10.26615/978-954-452-059-5_002
Bibkey:
Cite (ACL):
Lidia Pivovarova, Elaine Zosa, and Jani Marjanen. 2019. Word Clustering for Historical Newspapers Analysis. In Proceedings of the Workshop on Language Technology for Digital Historical Archives, pages 3–10, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Word Clustering for Historical Newspapers Analysis (Pivovarova et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W19-9002.pdf