Ivan Ubaleht


2022

pdf bib
Development of the Siberian Ingrian Finnish Speech Corpus
Ivan Ubaleht | Taisto-Kalevi Raudalainen
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The speech corpus includes audio data, annotations, software tools for data-processing, two databases and a web application. We have published part of the audio data and annotations. The software tool for parsing annotation files and feeding a relational database is developed and published under a free license. A web application is developed and available. At this moment, about 300 words and 200 phrases can be displayed using this web application.