Lexicon Induction for Spoken Rusyn – Challenges and Results

Achim Rabus; Yves Scherrer

doi:10.18653/v1/W17-1405

Lexicon Induction for Spoken Rusyn – Challenges and Results

Abstract

This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4% relative (42.9% absolute), which results in a tagging recall increased by 11.6% relative (9.1% absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages.

Anthology ID:: W17-1405
Volume:: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2017
Address:: Valencia, Spain
Editors:: Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27–32
Language:
URL:: https://aclanthology.org/W17-1405
DOI:: 10.18653/v1/W17-1405
Bibkey:
Cite (ACL):: Achim Rabus and Yves Scherrer. 2017. Lexicon Induction for Spoken Rusyn – Challenges and Results. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 27–32, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):: Lexicon Induction for Spoken Rusyn – Challenges and Results (Rabus & Scherrer, BSNLP 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/ml4al-ingestion/W17-1405.pdf
Data: MULTEXT-East

PDF Search