Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings

David Dukić, Ana Barić, Marko Čuljak, Josip Jukić, Martin Tutek


Abstract
Measuring how semantics of words change over time improves our understanding of how cultures and perspectives change. Diachronic word embeddings help us quantify this shift, although previous studies leveraged substantial temporally annotated corpora. In this work, we use a corpus of 9.5 million Croatian news articles spanning the past 25 years and quantify semantic change using skip-gram word embeddings trained on five-year periods. Our analysis finds that word embeddings capture linguistic shifts of terms pertaining to major topics in this timespan (COVID-19, Croatia joining the European Union, technological advancements). We also find evidence that embeddings from post-2020 encode increased positivity in sentiment analysis tasks, contrasting studies reporting a decline in mental health over the same period.
Anthology ID:
2025.bsnlp-1.13
Volume:
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jakub Piskorski, Pavel Přibáň, Preslav Nakov, Roman Yangarber, Michal Marcinczuk
Venues:
BSNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–115
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.13/
DOI:
Bibkey:
Cite (ACL):
David Dukić, Ana Barić, Marko Čuljak, Josip Jukić, and Martin Tutek. 2025. Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 108–115, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings (Dukić et al., BSNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.13.pdf