Michael Rießler


2025

2024

Christian texts have been known to be printed in Kola Saami languages since 1828; the most extensive publication is the Gospel of Matthew, different translations of which have been published three times since 1878, most recently in 2022. The Lord’s Prayer was translated in several more versions in Kildin Saami and Skolt Saami, first in 1828. All of these texts seem to go back to translations from Rus- sian. Such characteristics make these pub- lications just right for parallel text align- ment. This paper describes ongoing work with building a Kola Saami Christian Text Cor- pus, including conceptional and technical decisions. Thus, it describes a resource, rather than a study. However, compu- tational studies based on these data will hopefully take place in the near future, af- ter the Kildin Saami subset of this corpus is finished and published by the end of 2024. In addition to computation, this resource will also allow for comparative linguistic studies on diachronic and synchronic vari- ation and change in Kola Saami languages, which are among the most endangered and least described Uralic languages.

2021

2020

In this paper, we expand on previous work on automatic speech recognition in a low-resource scenario typical of data collected by field linguists. We train DeepSpeech models on 35 hours of dialectal Komi speech recordings and correct the output using language models constructed from various sources. Previous experiments showed that transfer learning using DeepSpeech can improve the accuracy of a speech recognizer for Komi, though the error rate remained very high. In this paper we present further experiments with language models created using KenLM from text materials available online. These are constructed from two corpora, one containing literary texts, one for social media content, and another combining the two. We then trained the model using each language model to explore the impact of the language model data source on the speech recognition model. Our results show significant improvements of over 25% in character error rate and nearly 20% in word error rate. This offers important methodological insight into how ASR results can be improved under low-resource conditions: transfer learning can be used to compensate the lack of training data in the target language, and online texts are a very useful resource when developing language models in this context.

2019

2018

Two Komi-Zyrian treebanks were included in the Universal Dependencies 2.2 release. This article contextualizes the treebanks, discusses the process through which they were created, and outlines the future plans and timeline for the next improvements. Special attention is paid to the possibilities of using UD in the documentation and description of endangered languages.

2017