A Finite-state Morphological Analyser for Tuvan

Francis Tyers, Aziyana Bayyr-ool, Aelita Salchak, Jonathan Washington


Abstract
~This paper describes the development of free/open-source finite-state morphological transducers for Tuvan, a Turkic language spoken in and around the Tuvan Republic in Russia. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST), we use the lexc formalism for modelling the morphotactics and twol formalism for modelling morphophonological alternations. We present a novel description of the morphological combinatorics of pseudo-derivational morphemes in Tuvan. An evaluation is presented which shows that the transducer has a reasonable coverage―around 93%―on freely-available corpora of the languages, and high precision―over 99%―on a manually verified test set.
Anthology ID:
L16-1407
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2562–2567
Language:
URL:
https://aclanthology.org/L16-1407
DOI:
Bibkey:
Cite (ACL):
Francis Tyers, Aziyana Bayyr-ool, Aelita Salchak, and Jonathan Washington. 2016. A Finite-state Morphological Analyser for Tuvan. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2562–2567, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Finite-state Morphological Analyser for Tuvan (Tyers et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/L16-1407.pdf