Restructuring and visualising dialect dictionary data: Report on Erzya and Moksha materials

Jack Rueter, Niko Partanen


Abstract
There are a number of Uralic dialect dictionaries based on fieldwork documentation of individual minority languages from the Pre-Soviet Era. The first of these published by the Finno-Ugrian Society features the Mordvin languages, Erzya and Moksha.In this article, we describe the possibility of reusing XML dialect dictionary collection point and phonetic variant data for visualizing informative linguistic isoglosses with R programming language’s Shiny web application frame-work.We provide a description of the ‘H. Paasonen Mordvin Dictionary’, which will possibly provide the reader with a better perspective of what data and challenges might present themselves in minority language dialect dictionaries.We provide a description of how we processed our data, and then we provide conclusions followed by a more extensive section on limitations. The conclusions state that only some of the data should be rendered with R Shiny web application, whereas some data might be better rendered by other applications.Our limitations section description calls for the extension the dialect dictionary database for a more concise description of the languageforms.
Anthology ID:
2025.nlp4dh-1.5
Volume:
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:
May
Year:
2025
Address:
Albuquerque, USA
Editors:
Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–47
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.5/
DOI:
Bibkey:
Cite (ACL):
Jack Rueter and Niko Partanen. 2025. Restructuring and visualising dialect dictionary data: Report on Erzya and Moksha materials. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 41–47, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Restructuring and visualising dialect dictionary data: Report on Erzya and Moksha materials (Rueter & Partanen, NLP4DH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.5.pdf