Dialects, Topic Models, and Border Effects: The Rusyn Case

Achim Rabus, Yves Scherrer


Abstract
In this contribution, we present, discuss, and apply a data-driven approach for analyzing varieties of the Slavic minority language Carpathian Rusyn spoken in different countries in the Carpathian region. Using topic modeling, a method originally developed for text mining, we show that the Rusyn varieties are subject to border effects, i.e., vertical convergence and horizontal divergence, due to language contacts with their respective umbrella languages Polish, Slovak and Standard Ukrainian. Additionally, we show that the method is suitable for uncovering fieldworker isoglosses, i.e., different transcription principles in an otherwise homogeneous dataset.
Anthology ID:
2025.bsnlp-1.5
Volume:
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jakub Piskorski, Pavel Přibáň, Preslav Nakov, Roman Yangarber, Michal Marcinczuk
Venues:
BSNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–43
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.5/
DOI:
Bibkey:
Cite (ACL):
Achim Rabus and Yves Scherrer. 2025. Dialects, Topic Models, and Border Effects: The Rusyn Case. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 38–43, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Dialects, Topic Models, and Border Effects: The Rusyn Case (Rabus & Scherrer, BSNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.5.pdf